https://www.cs.cornell.edu/~asampson//Adrian Sampson2020-12-08T08:59:38-05:00https://www.cs.cornell.edu/~asampson/media/icon/favicon.icoAdrian Sampsonhttps://www.cs.cornell.edu/~asampson/https://www.cs.cornell.edu/~asampson/blog/gradschool.htmlMake Your Grad School Application Sparkle with This One Weird Trick2020-11-25T00:00:00-05:00<p>It’s advice season here in academia. If you’re applying to a PhD in computer science, an admittedly mysterious endeavor, there are so many great compendia that will give you the conventional wisdom—there’s not much I can add to <a href="https://www.cs.cmu.edu/~harchol/gradschooltalk.pdf">Mor Harchol-Balter’s classic</a>, for instance. My first-order advice is to take her advice.</p>
<p>Here, though, is my #1 exclusive hot tip for maximizing your chances of getting into grad school:
<em>Have someone who has read lots of personal statements read your personal statement and give you feedback.</em></p>
<p>Your personal statement is not the most important part of your application—that’s your prior research and research-relevant experience, as conveyed through your recommendation letters—but it is the part you have the most control over once application season is upon you.</p>
<p>The problem with personal statements is that it’s weirdly hard to convey what makes a good one and what makes a bad one.
It’s not too hard for us on admissions committees to conclude “this statement is so exciting that I can’t imagine not admitting this person!” or “this particular statement is a total no-op so I hope the letters say something useful.”
We do that all the time, but I find it nearly impossible to give advice about how to write one of the good ones.
You want to make a convincing case that you have the potential to be a great researcher in computer science.
But that’s not actionable advice—it’s just a wish.</p>
<p>So the only way I know to write a good statement is to start early (seriously!) and get feedback from someone who has read a million of these things.
That describes just about any professor beyond their first year, so start by asking your research mentor if you have one.
Some PhD students are involved in admissions at many CS departments, so ask them if you have friends on that side of the undergrad-to-grad threshold.
But crucially, it’s not enough to ask a friend who’s good at writing, or even <a href="https://en.wikipedia.org/wiki/Writing_center#Writing_Centers_at_Higher_Education_Institutions">the writing center</a>.
You need to track down one of those weary admissions veterans who has read so many statements that they know all the clichés and can see all the opportunities for making your case.</p>
<p>The problem with this advice is that access to local feedback dispensers is, like so many things in academia, unfairly and inequitably distributed.
On my department’s admissions committee, for example, we often observe that disappointing personal statements typically reflect a lack of local expertise at an undergraduate institution, not any shortcomings in the applicant.
So our usual advice is that great statements might help an application, but even terrible statements should not hurt an application.</p>
<p>One thing is different this year, though: lots of top CS departments, <a href="https://www.cs.cornell.edu/information/news/newsitem11406/support-program-underrepresented-students-applying-cornell-cs-phd">including mine</a>, have suddenly started offering programs aim to counteract this advice gap.
If you’re applying for a CS PhD and don’t have someone you trust to give you good feedback on your application, these programs will match you with someone who can.
Check out <a href="https://twitter.com/andrewkuznet/status/1321873786304663552">@andrewkuznet’s megathread of similar programs</a>.
It’s likely too late for many of these this year, but look around.
And if you’re a member of any underrepresented group in CS and still looking for last-minute advice, <a href="mailto:asampson@cs.cornell.edu">send me an email</a>.</p>
<p>When PhD application season rolls around, much of what goes into your application is already pre-determined. I have one piece of advice you can still apply as deadlines loom to improve your materials.</p>
https://www.cs.cornell.edu/~asampson/blog/delete.htmlUncrudify the New ACM Digital Library2020-04-25T00:00:00-04:00<p>The ACM Digital Library got a redesign lately that adds a bunch of crud obscuring the content. I made this bookmarklet to help:</p>
<p><a href="javascript:(function()%7Bdocument.querySelectorAll('.pb-ad%2C%20.cookiePolicy-popup%2C%20%23surveyContent%2C%20.recommendations%2C%20header%2C%20.pill-list%2C%20.article__sections%2C%20.share__block%2C%20.issue-item__footer-info').forEach((e)%20%3D%3E%20e.parentNode.removeChild(e))%7D)()%3B" class="bookmarklet">DeLete</a></p>
<p>Drag that thing right there to your bookmarks bar, then click it when you’re on a DL page to turn this:</p>
<p><img src="/~asampson/media/delete/before.png" class="img-responsive" /></p>
<p>Into this:</p>
<p><img src="/~asampson/media/delete/after.png" class="img-responsive" /></p>
<h3 id="changelog">Changelog</h3>
<ul>
<li><em>May 5, 2020:</em> Proactively block the recommendation and feedback overlays before they have loaded.</li>
<li><em>April 24, 2020:</em> Initial release.</li>
</ul>
<p>Here’s a bookmarklet that removes the crud from the new ACM Digital Library pages.</p>
https://www.cs.cornell.edu/~asampson/blog/fpgaabstraction.htmlFPGAs Have the Wrong Abstraction2019-06-22T00:00:00-04:00<aside>
These are notes from a <a href="/~asampson/media/fpga-openmic.pdf">short talk</a> I’ll do at an “open mic” that some Microsoft folks are hosting at FCRC this weekend.
</aside>
<h2 id="the-computational-fpga">The Computational FPGA</h2>
<p>What is an FPGA?</p>
<p>I don’t think the architecture community has a consensus definition.
Let’s entertain three possible answers:</p>
<p><strong>Definition 1:</strong> <em>An FPGA is a bunch of transistors that you can wire up to make any circuit you want.</em> It’s like a breadboard at nanometer scale. Having an FPGA is like taping out a chip, but you only need to buy one chip to build lots of different designs—and you take an efficiency penalty in exchange.</p>
<p>I don’t like this answer.
It’s neither literally true nor a solid metaphor for how people actually use FPGAs.</p>
<p>It’s not literally true because of course you don’t literally rewire an FPGA—it’s actually a 2D grid of lookup tables connected by a routing network, with some arithmetic units and memories thrown in for good measure.
FPGAs do a pretty good job of faking arbitrary circuits, but they really are faking it, in the same way that a software circuit emulator fakes it.</p>
<p>The answer doesn’t work metaphorically because it oversimplifies the way people actually use FPGAs.
The next two definitions will do a better job of describing what FPGAs are for.</p>
<p><strong>Definition 2:</strong> <em>An FPGA is a cheaper alternative to making a custom chip, for prototyping and low-volume production.</em> If you’re building a router, you can avoid the immense cost of taping out a new chip for it and instead ship an off-the-shelf FPGA programmed with the functionality you need. Or if you’re designing a CPU, you can use an FPGA as a prototype: you can build a real, bootable system around it for testing and snazzy demos before you ship the design off to a fab.</p>
<p>Circuit emulation is the classic, mainstream use case for FPGAs, and it’s the reason they exist in the first place.
The point of an FPGA is to take a hardware design, in the form of HDL code, and to buy cheap hardware that behaves the same as the ASIC you would eventually produce.
You’re unlikely to take <em>exactly</em> the same Verilog code and make it work both on an FPGA and on real silicon, of course, but at least it’s in the same abstraction ballpark.</p>
<p><strong>Definition 3:</strong> <em>An FPGA is a pseudo-general-purpose computational accelerator.</em> Like a GPGPU, an FPGA is good for offloading a certain kind of computation. It’s harder to program than a CPU, but for the right workload, it can be worth the effort: a good FPGA implementation can offer orders-of-magnitude performance and energy advantages over a CPU baseline.</p>
<p>This is a different use case from ASIC prototyping.
Unlike circuit emulation, computational acceleration is an <em>emerging</em> use case for FPGAs.
It’s behind the recent Microsoft successes accelerating <a href="https://www.microsoft.com/en-us/research/project/project-catapult/">search</a> and <a href="https://www.microsoft.com/en-us/research/project/project-brainwave/">deep neural networks</a>.
And critically, the computational use case doesn’t depend on FPGAs’ relationship to real ASICs:
the Verilog code people write for FPGA-based acceleration need not bear any similarity to the kind of Verilog that would go into a proper tapeout.</p>
<p>These two use cases differ sharply in their implications for programming, compilers, and abstractions.
I want to focus on the latter use case, which I’ll call <em>computational FPGA</em> programming.
My thesis here is that the current approach to programming computational FPGAs, which borrows the traditional programming model from circuit emulation, is not the right thing.
Verilog and VHDL are exactly the right thing if you want to prototype an ASIC.
But we can and should rethink the entire stack when the goal is computation.</p>
<h2 id="the-gpufpga-analogy">The GPU–FPGA Analogy</h2>
<p>Let’s be ruthlessly literal.
An FPGA is a special kind of hardware for efficiently executing a special kind of software that resembles a circuit description.
An FPGA configuration is a funky kind of software, but it’s software, not hardware—it’s a program written for a strange ISA.</p>
<p>There’s a strong analogy here to GPUs.
Before deep learning and before dogecoin, there was a time when GPUs were for graphics.
<a href="https://graphics.stanford.edu/papers/gpumatrixmult/gpumatrixmult.pdf">In the early 2000s</a>, people realized they could abuse a GPU as an accelerator for lots of computationally intensive kernels that had nothing to do with graphics: that GPU designers had built a more general kind of machine, for which 3D rendering was just one application.</p>
<p>Computational FPGAs are following the same trajectory.
The idea is to abuse this funky hardware not for circuit emulation but to exploit computational patterns that are amenable to circuit-like execution.
In the form of an SAT analogy:</p>
<p class="showcase">
GPU : graphics :: FPGA : circuit simulation
</p>
<p>To let GPUs blossom into the data-parallel accelerators they are today, people had to reframe the concept of what a GPU takes as input.
We used to think of a GPU taking in an exotic, intensely domain specific description of a visual effect.
We unlocked their true potential by realizing that GPUs execute programs.
This realization let GPUs evolve from targeting a single <em>application</em> domain to targeting an entire <em>computational</em> domain.
I think we’re in the midst of a similar transition with computational FPGAs:</p>
<p class="showcase">
GPU : massive, mostly regular data parallelism :: FPGA : irregular parallelism with static structure
</p>
<p>The world hasn’t settled yet on a succinct description of the fundamental computational pattern that FPGAs are supposed to be good at.
But it has something to do with potentially-irregular parallelism, data reuse, and mostly-static data flow.
Like GPUs, FPGAs need a hardware abstraction that embodies this computational pattern:</p>
<p class="showcase">
GPU : SIMT ISA :: FPGA : ____
</p>
<p>What’s missing here is an ISA-like abstraction for the software that FPGAs run.</p>
<h2 id="rtl-is-not-an-isa">RTL Is Not an ISA</h2>
<p>The problem with Verilog for computational FPGAs is that it neither does a good job as a low-level hardware abstraction nor as a high-level programming abstraction.
By way of contradiction, let’s imagine what it would look like if RTL were playing each of these roles well.</p>
<p><strong>Role 1:</strong> <em>Verilog is an ergonomic high-level programming model that targets a lower-level abstraction.</em>
In this thought experiment, the ISA for computational FPGAs is something at a lower level of abstraction than RTL: netlists or bitstreams, for example.
Verilog is the more productive, high-level programming model that we expose to humans.</p>
<p>Even RTL experts probably don’t believe that Verilog is a productive way to do mainstream FPGA development. It won’t propel programmable logic into the mainstream.
RTL design may seem friendly and familiar to veteran hardware hackers, but the productivity gap with software languages is immeasurable.</p>
<p><strong>Role 2:</strong> <em>Verilog is a low-level abstraction for FPGA hardware resources.</em> That is, Verilog is to an FPGA as an ISA is to a CPU. It may not be convenient to program in, but it’s a good target for compilers from higher-level languages because it directly describes what goes on in the hardware.
And it’s the programming language of last resort for when you need to eke out the last few percentage points of performance.</p>
<p>And indeed, Verilog is the <em>de facto</em> ISA for today’s computational FPGAs.
The major FPGA vendors’ toolchains take Verilog as input, and compilers from higher-level languages emit Verilog as their output.
<a href="http://www.megacz.com/thoughts/bitstream.secrecy.html">Vendors keep bitstream formats secret</a>, so Verilog is as low in the abstraction hierarchy as you can go.</p>
<p>The problem with Verilog as an ISA is that it is too far removed from the hardware.
The abstraction gap between RTL and FPGA hardware is enormous: it traditionally contains at least synthesis, technology mapping, and place & route—each of which is a complex, slow process.
As a result, the compile/edit/run cycle for RTL programming on FPGAs takes hours or days and, worse still, it’s unpredictable:
the deep stack of toolchain stages can obscure the way that changes in RTL will affect the design’s performance and energy characteristics.</p>
<p>A good ISA should directly expose unvarnished truth about the underlying hardware.
Like an assembly language, it need not be convenient to program in.
But also like assembly, it should be extremely fast to compile and yield predictable results.
If there’s going to be a hope of building higher-level abstractions and compilers, they’re going to need such a low-level target that’s free of surprises.
RTL is not that target.</p>
<h2 id="the-right-abstraction">The Right Abstraction?</h2>
<p>I don’t know what abstraction should replace RTL for computational FPGAs.
Practically, replacing Verilog may be impossible as long as the FPGA vendors keep their lower-level abstractions secret and their sub-RTL toolchains proprietary.
The long-term resolution to this problem might only come when the hardware evolves, as GPUs once did:</p>
<p class="showcase">
GPU : GPGPU :: FPGA : ____
</p>
<p>If computational FPGAs are accelerators for a particular class of algorithmic patterns, there’s no reason to believe that today’s FPGAs are the ideal implementation of that goal.
A new category of hardware that beats FPGAs at their own game could bring with it a fresh abstraction hierarchy.
The new software stack should dispense with FPGAs’ circuit emulation legacy and, with it, their RTL abstraction.</p>
<p>Verilog is the <em>de facto</em> abstraction for programming today’s FPGAs. RTL programming is fine if you want to use FPGAs for their traditional purpose as circuit emulators, but it’s not the right thing for the new view of FPGAs as general-purpose computational accelerators.</p>
https://www.cs.cornell.edu/~asampson/blog/mlccode.htmlPlease Help Me Solve an Old Approximate Computing Problem2018-06-02T00:00:00-04:00<p>In a different era, I worked on a project about <a href="https://dl.acm.org/citation.cfm?id=2644808">approximate storage</a>.
This post is about a problem we never solved during that project—a problem that haunts me to this day.</p>
<p>One of our ideas was to abuse <a href="https://en.wikipedia.org/wiki/Multi-level_cell">multi-level cells (MLCs)</a>, which are where you pack more than one bit into a single physical memory element.
Because memory cells are analog devices, MLC designs amount to a choice of how to quantize the underlying signal to a digital value.
Packing in the analog levels more tightly gives you more bits per cell in exchange for slower reads and writes—or more error.</p>
<figure style="max-width: 400px;">
<img src="/~asampson/media/approxstorage/mlc-precise.svg" alt="guard bands in a precise multi-level cell" />
<figcaption>A precise multi-level cell sizes its guard bands so the value distributions for each level overlap minimally.</figcaption>
<img src="/~asampson/media/approxstorage/mlc-approx.svg" alt="guard bands in a precise multi-level cell" />
<figcaption>An approximate multi-level cell allows non-negligible error by letting the distributions overlap.</figcaption>
</figure>
<p>Our idea was to pack more levels into a cell, beyond what would be allowed in a traditional, precise memory.
Without adjusting the timing to compensate, we exposed the resulting errors to the rest of the system.
Approximate computing!</p>
<p>The nice thing about these approximate memories is that analog storage errors are more often small than large.
In a four-level (two-bit) cell, for example, when you write a 0 into the cell, you are more likely to read a 1 back later than a 3.
Put differently, error probabilities are monotonic in the value distance.
If $v$ is the value you originally wrote and $v_1$ and $v_2$ are other cell values where $|v - v_1| \ge |v - v_2|$, then the probability of reading $v_1$ is at most the probability of reading $v_2$.
Applications enjoy small errors more than they enjoy large errors, so MLC-style monotonic errors are a good fit.</p>
<p>The problem, however, is that real programs don’t use many two-bit numbers.
It’s not feasible to cram 65,536 levels into a single cell in most technologies, but we’d really like to be able to use 16-bit numbers in our programs.
It’s easy to combine, say, two two-bit cells to represent a four-bit number under ordinary circumstances: just split up the bits or use a <a href="https://en.wikipedia.org/wiki/Gray_code">Gray code</a> to minimize the expected cost of small changes.
But these strategies ruin our nice error monotonicity property:
small changes in one cell might cause large changes in our four-bit number.</p>
<h3 id="stating-the-problem">Stating the Problem</h3>
<p>Let’s compare strategies for encoding $n$-bit numbers onto $c$ cell values of $b$ bits each.
A given code will consist of an encoding function $e$ and a decoding function $d$.
Encoding turns a single $n$-bit number into a $c$-tuple of $b$-bit numbers, so we’ll write $e(x) = \overline{v} = \langle v_1, v_2, \ldots, v_c \rangle$ where each $v_i$ consists of $b$ bits.</p>
<p>We assume that, within a given cell, small errors are more likely than large errors.
We <em>hope</em> that small per-cell errors translate to small errors in the decoded value.
To make this precise, define an overloaded function $\Delta$ that gets the size of errors in either encoded or plain-text values.
For plain numbers, for example, $\Delta(1000, 0110) = 2$; this function is the absolute difference between the values.
For encoded cell-value tuples, $\Delta(\langle 01, 10 \rangle, \langle 10, 01 \rangle) = 2$; the function is the sum of the differences for each cell.
Here’s a formal statement of an error-monotonicity property we’d like:</p>
<div class="kdmath">$$
\Delta(\overline{v}, \overline{v}_1) \ge \Delta(\overline{v}, \overline{v}_2)
\Rightarrow
\Delta(d(\overline{v}), d(\overline{v}_1)) \ge \Delta(d(\overline{v}), d(\overline{v}_2))
$$</div>
<p>In other words, if some cell-level error is smaller than another cell-level error, then it <em>also</em> translates to a smaller error in the space of decoded numbers.</p>
<h3 id="the-options">The Options</h3>
<p>Let’s consider a few options.
For simplicity, I’ll give examples for $n=4$, $c=2$, and $b=2$, but each strategy should generalize to any problem size where $n = c \times b$.</p>
<ul>
<li>
<p>I’ll call the naïve strategy a <em>chunking code</em> because it just breaks the number into $c$ equally-sized pieces.
For example, $e(0110) = \langle 01, 10 \rangle$.
But a small, one-level error in the first cell causes a large error in the represented value.
For example, an error of size one can turn $\langle 01, 10 \rangle$
into $\langle 00, 10 \rangle$.
The decoded error size is $\Delta(0110, 0010) = 4$.
So a distance-one error in the cells can lead to a distance-four error in the value. Such an error is just as likely as a distance-one value error (when the second cell is faulty instead of the first).</p>
</li>
<li>
<p>A <a href="https://en.wikipedia.org/wiki/Gray_code">Gray code</a> tries to avoid situations where incrementing a number makes many cells change simultaneously.
This property minimizes the cost of the most common writes, so it’s a popular strategy for memory coding.
But I contend that, in an abstract sense, it’s the <em>opposite</em> of what we want for error robustness.
A Gray code takes small changes in the value and turns them into small changes in the cells. We want this implication to go the other way around: small changes in the cells should lead to small changes in the corresponding values.
A small change in a cell can still lead to an arbitrarily large change in the represented number.</p>
</li>
<li>
<p>Grasping at straws, we could try a <em>striping code</em> where the bits are interleaved: the first cell holds all the bits at positions that are zero mod $b$; the next cell gets all the bits at 1 mod $b$, and so on.
For example, $e(0011) = \langle 01, 01 \rangle$.
But clearly, a small error in one cell can still lead to a large error in the value.
For example, a single-level error can produce $\langle 10, 01 \rangle$.
And
$d(\langle 10, 01 \rangle) = 1001$, which is a value-space error of 6.</p>
</li>
</ul>
<h3 id="a-question">A Question</h3>
<p>None of these three options meets the goal I wrote above.
Worse, none of them seems meaningfully <em>closer</em> to satisfying error-monotonicity than any other.
For about five years now, I’ve wondered whether it’s possible to do any better than the naïve chunking code.
I would be equally satisfied with a definitive <em>no</em> as with an existence proof.
But so far, I have no traction at all in either direction and Google has failed me.
Let me know if you have any bright ideas—I’d be thrilled.</p>
<hr />
<h3 id="addendum-july-26-2018">Addendum (July 26, 2018)</h3>
<p>Enormous thanks to the many brilliant people who wrote in with ideas, suggestions, and solutions.
Thanks in particular to <a href="https://homes.cs.washington.edu/~jrw12/">James Wilcox</a>, Brian Rice, Shoshana Izsak, and <a href="http://plrg.eecs.uci.edu">Brian Demsky</a>, who helped me answer the concrete question above.</p>
<p><strong>Error monotonicity, as stated, is impossible.</strong>
The answer to “does a code exist that obeys the error-monotonicity inequality?” is “no.”
Here’s a quick pigeonhole argument.
If $x$ is a number and $e(x)$ is the corresponding codeword, there are $2c$ ways for a one-level error to occur with respect to $e(x)$.
But there are only two numbers ($x-1$ and $x+1$) that are within a distance 1 of $x$.
If $c>1$, then there are more one-level error possibilities in cell space than distance-one “neighbors” in value space.
So in any bijective code, where each of the $2^n$ numbers has exactly one codeword and vice-versa,
some one-level errors are bound to incur value-space errors of magnitude greater than one.</p>
<p><strong>A generalized question.</strong>
We now know that a code cannot preserve strict error monotonicity, but I’m still curious whether it’s possible to get <em>closer</em> than the naïve codes above.
Consider a generalization that adds an approximation factor, $\alpha$:</p>
<div class="kdmath">$$
\Delta(\overline{v}, \overline{v}_1) \ge \Delta(\overline{v}, \overline{v}_2)
\Rightarrow
\alpha \times \Delta(d(\overline{v}), d(\overline{v}_1)) \ge \Delta(d(\overline{v}), d(\overline{v}_2))
$$</div>
<p>The parameter in this version of the property gives codes more flexibility: they can violate error monotonicity within a factor $\alpha$.
It would be useful to bound $\alpha$ for any given code.
It likely depends on $c$ and $b$, but a code that has a <em>constant</em> $\alpha$ would be awesome—as would a proof that no such code exists.</p>
<p>This is, of course, just one way to generalize the property.
I would also be interested in alternative ways to generalize error monotonicity.</p>
<p><strong>The question isn’t about ECC.</strong>
I got several comments suggesting codes that use more than $2^n$ bits to represent $n$-bit numbers.
This kind of strategy amounts to an error-correcting code.
In the limit, if you add enough redundant bits, you might correct all errors within a bound—you’d recover <em>precise</em> storage under some assumptions on which errors are likely.</p>
<p>While ECC is of course fascinating and practical, it muddies the question I’m hoping to answer, which is about the cool properties of MLC memories for <em>approximate</em> storage.
We know that a single cell is great for keeping likely errors small even <em>without</em> any redundancy, and
the question I’m hooked on is how well that property generalizes to multiple cells.
Adding ECC conflates that question with a separate one about the best way to introduce redundancy.
In the end, a real system would probably want to combine some ECC and some of the “natural” error-minimizing properties of MLC storage, but let’s keep the two ideas separate and consider <em>only</em> bijective codes.</p>
<p>Here’s a problem from our paper on <a href="https://dl.acm.org/citation.cfm?id=2644808">approximate storage</a> that has been bugging me for about five years now. I think it’s a coding theory problem, but I have no traction whatsoever. Send me your brilliant insights.</p>
https://www.cs.cornell.edu/~asampson/blog/minisynth.htmlProgram Synthesis is Possible2018-05-09T00:00:00-04:00<p><a href="https://homes.cs.washington.edu/~bornholt/post/synthesis-explained.html">Program synthesis</a> is not only a hip session title at programming languages conferences. It’s also a broadly applicable technique that people from many walks of computer-science life can use.
But it can seem like magic: automatically generating programs from specifications sounds like it might require a PhD in formal methods.
<a href="https://www.cs.wisc.edu">Wisconsin</a>’s <a href="http://www.cs.wisc.edu/~aws">Aws Albarghouthi</a> wrote a wonderful <a href="http://barghouthi.github.io/2017/04/24/synthesis-primer/">primer on synthesis</a> last year that helps demystify the basic techniques with example code.
Here, we’ll expand on Aws’s primer and build a tiny but complete-ish synthesis engine from scratch.</p>
<p>You can follow along with <a href="https://github.com/sampsyo/minisynth">my Python code</a> or start from an empty buffer.</p>
<h2 id="z3-is-amazing">Z3 is Amazing</h2>
<p>We won’t <em>quite</em> start from scratch—we’ll start with <a href="https://github.com/Z3Prover/z3">Z3</a> and its Python bindings.
Z3 is a <a href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories">satisfiability modulo theories (SMT) solver</a>, which is like a SAT solver with <em>theories</em> that let you express constraints involving integers, bit vectors, floating point numbers, and what have you.
We’ll use Z3’s Python bindings.
On a Mac, you can install everything from <a href="https://brew.sh">Homebrew</a>:</p>
<pre><code class="language-sh">$ brew install z3
</code></pre>
<p>Let’s <a href="https://github.com/sampsyo/minisynth/blob/master/ex0.py">try it out</a>:</p>
<pre><code class="language-python">import z3
</code></pre>
<p>To use Z3, we’ll write a logical formula over some variables and then solve them to get a <em>model</em>, which is a valuation of the variables that makes the formula true.
Here’s one formula, for example:</p>
<pre><code>formula = (z3.Int('x') / 7 == 6)
</code></pre>
<p>The <code>z3.Int</code> call introduces a Z3 variable.
Running this line of Python doesn’t actually do any division or equality checking; instead, the Z3 library overloads Python’s <code>/</code> and <code>==</code> operators on its variables to produce a proposition.
So <code>formula</code> here is a logical proposition of one free integer variable, $x$, that says that $x \div 7 = 6$.</p>
<p>Let’s solve <code>formula</code>.
We’ll use a little function called <code>solve</code> to invoke Z3:</p>
<pre><code>def solve(phi):
s = z3.Solver()
s.add(phi)
s.check()
return s.model()
</code></pre>
<p>Z3’s solver interface is much more powerful than what we’re doing here, but this is all we’ll need to get the model for a single problem:</p>
<pre><code>print(solve(formula))
</code></pre>
<p>On my machine, I get:</p>
<pre><code>[x = 43]
</code></pre>
<p>which is admittedly a little disappointing, but at least it’s true: using integer division, $43 \div 7 = 6$.</p>
<p>Z3 also has a theory of bit vectors, as opposed to unbounded integers, which supports shifting and whatnot:</p>
<pre><code>y = z3.BitVec('y', 8)
print(solve(y << 3 == 40))
</code></pre>
<p>There are even logical quantifiers:</p>
<pre><code>z = z3.Int('z')
n = z3.Int('n')
print(solve(z3.ForAll([z], z * n == z)))
</code></pre>
<p>Truly, Z3 is amazing.
But we haven’t quite synthesized a program yet.</p>
<h2 id="sketching">Sketching</h2>
<p>In the <a href="https://people.csail.mit.edu/asolar/papers/thesis.pdf">Sketch</a> spirit, we’ll start by synthesizing <em>holes</em> to make programs equivalent.
Here’s the scenario: you have a slow version of a program you’re happy with; that’s your specification.
You can <em>sort of</em> imagine how to write a faster version, but a few of the hard parts elude you.
The synthesis engine’s job will be fill in those details so that the two programs are equivalent on every input.</p>
<p>Take <a href="http://barghouthi.github.io/2017/04/24/synthesis-primer/">Aws’s little example</a>:
you have the “slow” expression <code>x * 2</code>, and you know that there’s a “faster” version to be had that can be written <code>x << ??</code> for some value of <code>??</code>.
<a href="https://github.com/sampsyo/minisynth/blob/master/ex1.py">Let’s ask Z3</a> what to write there:</p>
<pre><code>x = z3.BitVec('x', 8)
slow_expr = x * 2
h = z3.BitVec('h', 8) # The hole, a.k.a. ??
fast_expr = x << h
goal = z3.ForAll([x], slow_expr == fast_expr)
print(solve(goal))
</code></pre>
<p>Nice! We get the model <code>[h = 1]</code>, which tells us that the two programs produce the same result for every byte <code>x</code> when we left-shift by 1.
That’s (a very simple case of) synthesis: we’ve generated a (subexpression of a) program that meets our specification.</p>
<p>Without a proper programming language, however, it doesn’t feel much like generating programs.
We’ll fix that next.</p>
<h2 id="a-tiny-language">A Tiny Language</h2>
<p>Let’s <a href="https://github.com/sampsyo/minisynth/blob/master/ex2.py">conjure a programming language</a>.
We’ll need a parser; I choose <a href="https://github.com/lark-parser/lark">Lark</a>.
Here’s my Lark grammar for a little language of arithmetic expressions, which I ripped off from the <a href="https://github.com/lark-parser/lark/blob/master/examples/calc.py">Lark examples</a> and which I offer to you now for no charge:</p>
<pre><code class="language-python">GRAMMAR = """
?start: sum
?sum: term
| sum "+" term -> add
| sum "-" term -> sub
?term: item
| term "*" item -> mul
| term "/" item -> div
| term ">>" item -> shr
| term "<<" item -> shl
?item: NUMBER -> num
| "-" item -> neg
| CNAME -> var
| "(" start ")"
%import common.NUMBER
%import common.WS
%import common.CNAME
%ignore WS
""".strip()
</code></pre>
<p>You can write arithmetic and shift operations on literal numbers and variables. And there are parentheses!
Lark parsers are easy to use:</p>
<pre><code>import lark
parser = lark.Lark(GRAMMAR)
tree = parser.parse("(5 * (3 << x)) + y - 1")
</code></pre>
<p>As for any good language, you’ll want an interpreter.
<a href="https://github.com/sampsyo/minisynth/blob/master/ex2.py">Here’s one</a> that processes Lark parse trees and takes a function in as an argument to look up variables by their names:</p>
<pre><code>def interp(tree, lookup):
op = tree.data
if op in ('add', 'sub', 'mul', 'div', 'shl', 'shr'):
lhs = interp(tree.children[0], lookup)
rhs = interp(tree.children[1], lookup)
if op == 'add':
return lhs + rhs
elif op == 'sub':
return lhs - rhs
elif op == 'mul':
return lhs * rhs
elif op == 'div':
return lhs / rhs
elif op == 'shl':
return lhs << rhs
elif op == 'shr':
return lhs >> rhs
elif op == 'neg':
sub = interp(tree.children[0], lookup)
return -sub
elif op == 'num':
return int(tree.children[0])
elif op == 'var':
return lookup(tree.children[0])
</code></pre>
<p>As everybody already knows from their time in <a href="http://www.cs.cornell.edu/courses/cs6110/2018sp/">CS 6110</a>, your interpreter is just an embodiment of your language’s big-step operational semantics.
It works:</p>
<pre><code>env = {'x': 2, 'y': -17}
answer = interp(tree, lambda v: env[v])
</code></pre>
<p>Nifty, but there’s no magic here yet.
Let’s add the magic.</p>
<h2 id="from-interpreter-to-constraint-generator">From Interpreter to Constraint Generator</h2>
<p>The key ingredient we’ll need is a <em>translation</em> from our source programs into Z3 constraint systems.
Instead of computing actual numbers, we want to produce equivalent formulas.
For this, Z3’s operator overloading is the raddest thing:</p>
<pre><code>formula = interp(tree, lambda v: z3.BitVec(v, 8))
</code></pre>
<p>Incredibly, we get to reuse our interpreter as a constraint generator by just swapping out the variable-lookup function.
Every Python <code>+</code> becomes a plus-constraint-generator, etc.
In general, we’d want to convince ourselves of the <em>adequacy</em> of our translation, but reusing our interpreter code makes this particularly easy to believe.
This similarity between interpreters and synthesizers is a big deal: it’s an insight that <a href="https://homes.cs.washington.edu/~emina/index.html">Emina Torlak</a>’s <a href="https://emina.github.io/rosette/">Rosette</a> exploits with great aplomb.</p>
<h2 id="finishing-synthesis">Finishing Synthesis</h2>
<p>With formulas in hand, we’re almost there.
Remember that we want to synthesize values for holes to make two programs equivalent, so
we’ll need two Z3 expressions that share variables.
I wrapped up an enhanced version of the constraint generator above in a function that also produces the variables involved:</p>
<pre><code>expr1, vars1 = z3_expr(tree1)
expr2, vars2 = z3_expr(tree2, vars1)
</code></pre>
<p>And here’s my hack for allowing holes without changing the grammar: any variable name that starts with an “H” is a hole.
So we can filter out the plain, non-hole variables:</p>
<pre><code>plain_vars = {k: v for k, v in vars1.items()
if not k.startswith('h')}
</code></pre>
<p>All we need now is a quantifier over equality:</p>
<pre><code>goal = z3.ForAll(
list(plain_vars.values()), # For every valuation of variables...
expr1 == expr2, # ...the two expressions produce equal results.
)
</code></pre>
<p>Running <code>solve(goal)</code> gets a valuation for each hole.
In <a href="https://github.com/sampsyo/minisynth/blob/master/ex2.py">my complete example</a>, I’ve added some scaffolding to load programs from files and to pretty-print the expression with the holes substituted for their values.
It expects two programs, the spec and the hole-ridden sketch, on two lines:</p>
<pre><code>$ cat sketches/s2.txt
x * 10
x << h1 + x << h2
</code></pre>
<p>It absolutely works:</p>
<pre><code>$ python3 ex2.py < sketches/s2.txt
x * 10
(x << 3) + (x << 1)
</code></pre>
<h2 id="better-holes-with-conditions">Better Holes with Conditions</h2>
<p>Our example so far can only synthesize constants, which is nice but unsatisfying.
What if we want to synthesize a shifty equivalent to <code>x * 9</code>, for example?
We might think of a sketch like <code>x << ?? + ??</code>, but there is no pair of literal numbers we can put into those holes to make it equivalent.
How can we synthesize a wider variety of expressions, like <code>x</code>?</p>
<p>We can get this to work without fundamentally changing our synthesis strategy.
We will, however, need to add conditions to our language.
We’ll need to extend the parser with a ternary operator:</p>
<pre><code>?start: sum
| sum "?" sum ":" sum -> if
</code></pre>
<p>And I’ll add a very suspicious-looking case to our interpreter:</p>
<pre><code>elif op == 'if':
cond = interp(tree.children[0], lookup)
true = interp(tree.children[1], lookup)
false = interp(tree.children[2], lookup)
return (cond != 0) * true + (cond == 0) * false
</code></pre>
<p>These funky multiplications are just a terrible alternative to Python’s built-in condition expression.
I’m using this here instead of a straightforward <code>true if cond else false</code> because this works in both interpreter mode <em>and in Z3 constraint generation mode</em> and behaves the same way.
I apologize for the illegible code but not for the convenience of a single implementation.</p>
<p>The trick is to use conditional holes to switch between expression forms.
Here’s an implementation of the sketch we want above:</p>
<pre><code>$ cat sketches/s4.txt
x * 9
x << (hb1 ? x : hn1) + (hb2 ? x : hn2)
</code></pre>
<p>Each ternary operator here represents a <code>??</code> hole in the sketch we wanted to write, <code>x << ?? + ??</code>.
By choosing the values of <code>hb1</code> and <code>hb2</code>, the synthesizer can choose whether to use a literal or a variable in that place.
In a proper synthesis system, we’d hide these conditions from the surface syntax—the constraint generator would insert a condition for every <code>??</code>.
By conditionally switching between a wider variety of syntax forms—even using nested conditions for nested expressions—the tool can synthesize complex program fragments in each hole.</p>
<h2 id="keep-synthesizing">Keep Synthesizing</h2>
<p>It may be a toy language, but we’ve built a synthesizer!
Program synthesis is a powerful idea that can come in handy in far-flung domains of computer science.
To learn more about the hard parts that come next, I recommend <a href="https://homes.cs.washington.edu/~bornholt/">James Bornholt</a>’s <a href="https://homes.cs.washington.edu/~bornholt/post/synthesis-explained.html">overview</a>.
And you must check out <a href="https://emina.github.io/rosette/">Rosette</a>, a tool that gives you the scaffolding to write synthesis-based tools without interacting with an SMT solver directly as we did here.</p>
<p>Inspired by <a href="http://www.cs.wisc.edu/~aws">Aws Albarghouthi</a>’s <a href="http://barghouthi.github.io/2017/04/24/synthesis-primer/">primer</a>, I’ll give a little lecture on program synthesis for the last day of this year’s <a href="http://www.cs.cornell.edu/courses/cs6110/2018sp/">CS 6110</a>. Here’s a <a href="https://github.com/sampsyo/minisynth">code</a>-driven introduction whose goal is to convince you that you too can synthesize programs.</p>