程序综合的一般原则
原论文第三章的内容
reduction of second-order problem
Program synthesis is a second-order search problem. If we can reduce this second-order search problem to a first-order search problem, we can leverage off-the-shelf constraint solvers that can deal with first-order constraints.
One way of doing this is to use templates, which are hints about the syntactic structure of the artifact to be discovered. Then the problem is reduced to searching for the missing components.
The template-based program synthesis approach builds on top of the template-based program verification approach.
In most cases, however, relevant domain-specific templates are unavailable.
Oracle-Guided Synthesis
find a program satisfying the specification is hard, but verify a program satisfy the specification is easier.
Suppose we have a candidate program, we can build a constraint that, If it is unsatisfiable, then the program is a valid solution to the original problem. If it is satisfiable, then we get a counterexample (i, o) the desired program must not satisfy, which we can take into account in our subsequent search.
If only one possible solution (program) is possible for the given specification, then we can reformulate the constraint such that if is is satisfiable, it produces a input-output pair (i, o) that the solution must satisfy.
counterexample-guided inductive synthesis (CEGIS) is based on observation above.
CEGIS

In CEGIS, we split the synthesis problem into search and verification.
- The first component (often called the solver) starts with a simple first-order specification, which is a simplified version of the original second-order requirement on the desired program. It produces a program candidate that satisfies this simplified specification.
- The second component (often called the verifier) validates that this program candidate satisfies the original specification. If it does not, the verifier produces a counterexample, which is returned back to the solver. The solver adds it to the specification and repeats its search, looking for a new program candidate that, in addition satisfies this counterexample.
The process repeats until either
- the verifier accepts a candidate ⇒ success
- the solver cannot find a candidate ⇒ fail, the original problem is unsolvable.
Oracle
In oracle-guided inductive synthesis (OGIS), the solver first searches for a program candidate using a simplified specification (as described above), and then consults an oracle regarding validity of this candidate.
CEGIS uses correctness oracle, which returns if the candidate program is valid w.r.t. to specification, and a counterexample if it does not.
distinguishing input oracle
Assume a complete but expensive validation oracle that verifies whether a candidate satisfies the specification. such oracle may be challenging to write.
To avoid consulting the validation oracle, we reformulate the problem in terms of program distinguishability on concrete inputs:
Suppose an I/O oracle which can produce correct output on arbitrary input, we can use it to produce some finite pairs of valid input/output. Then
- a program that is consistent with the chosen valid input/output pairs is a good candidate for a program that may be consistent with the entire specification
- an incorrect candidate that happens to be consistent with the chosen set of inputs/outputs can be detected by extending the set with a new distinguishing input
distinguishing input yields different behavior for the current candidate program P and some other possible candidate P’ (which is consistent with the previous chosen inputs/outputs, but unnecessarily the correct one)
If no such distinguishing input exists, it means all programs that are consistent with the chosen set of inputs are semantically equivalent, i.e. either of these:
- any such candidate is valid
- no such program can be constructed (problem unsolvable)
We just need to call the validation oracle once in this final stage.

Further Method
Godefroid and Taly found that for a given subset of candidate functions, there exists a subset of universal distinguishing inputs — a set that is a priori guaranteed to distinguish any two functions from the subset.
see godefroidAutomatedSynthesisSymbolic2012
Syntactic Bias
one of the key ideas to scaling up modern program synthesis is syntactically restricting the space of possible programs.
Sketch
a sketch is a partial program, which expresses the high-level structure of the intended implementation but leaves holes for low-level implementation details. It resembles C like language with one additional feature – a symbol ?? that represent an unknown constant integer value.
a hypothesis space over a richer class of expressions in the sketch is also posssible, by using the integer hole. e.g.

Syntax-Guided Synthesis
The input to the syntax-guided synthesis problem (SyGuS) consists of :
- a background theory T that defines the vocabulary and interpretation of function and relation symbols
- a semantic correctness specification for the desired program given by a logical formula φ
- a syntactic set of candidate implementations given by a context-free grammar G
DSL
consider the following factors:
- balanced expressivity
- choice of operators
- naturalness
- efficiency
see gulwaniProgramSynthesis2017 for description of FlashFill DSL
Optimization
in many settings we are interested not just in any program, but in the best program according to some cost function.
some interesting cost functions:
- program speed : default in superoptimization.
- robustness, i.e. learning should not overfit to provided input-output examples.
- readability
Ranking in program synthesis has been approached with many different techniques
- Markov Chain Monte Carlo(MCMC) technique: Metropolis-Hastings algorithm is used to explore a search space of possible programs. In MCMC sampling, programs are drawn from the space with probability proportional to their cost.
- Version Space Algebras
- Machine Learning
- Metasketch. A metasketch is a generalization of a sketch. Instead of a single partial program, it gives an ordered family of such partial programs, along with a cost function and a gradient, which suggests which sketches in the family may contain a program with a lower cost. it describes a search space as a family of finite sub-spaces that can be explored independently, and provides guiding functions to navigate these sub-spaces in a cost-effective way.