Lesson 11: Dynamic Compilers

discussion
Dynamic Compilers
Tracing via Speculation
talk video on the history of JITs
by JF Bastien at CppCon 2020
tasks due December 7

Gist

Motivation

So far, the compilers you have written in 6120 have been ahead-of-time (AOT) or static compilers: they compile the code to some other form in one phase, and then execution happens later—and has nothing to do with the compiler. Just-in-time (JIT) or dynamic compilers mix together compilation and execution: they translate your code when it's time to run it, so they are intimately involved in program execution.

Why would you want a dynamic compiler? There are two main reasons:

JITs have access to information that static compilers do not. They can compile programs knowing something about how often a given branch is taken, which type a variable has, or even detailed statistics about the inputs the program is dealing with. In other words, JITs get to do profile-guided optimization, which is also possible in the AOT world but less convenient (because it requires an extra, slow-running phase during compilation) and less accurate (because it relies on the assumption that the inputs and program behavior during profiling are close to the behavior in deployment).
Some kinds of languages really benefit from exploiting that run-time information. In particular, dynamic languages like JavaScript, Python, and Ruby are absurdly popular—and their semantics make it really hard for AOT compilers to produce efficient code. For example, the expression a + b in Python in general requires looking up and invoking up a method called __add__ in the object a. If we could somehow know that a and b were both float values, we could conceivably implement it instead with a single CPU instruction. But Python makes it very hard to know statically what the types of variables are. A JIT compiler can profile the actual dynamic execution to determine what these types are and then use them to generate far better code.

Here are some semi-related terminology rants:

You may have heard of something called "interpreted languages." There is no such thing: languages do not have an intrinsic property that makes them interpreted or compiled. You can implement Rust in an interpreter, Python with an AOT compiler, or C++ with a JIT. When people say "interpreted language," they usually mean "a language for which it is hard to write a high-quality compiler."
Interpretation vs. compilation is a spectrum. For example, we usually think of CPython as an interpreter rather than a compiler, but it actually has a compilation step: it translates Python programs into a more regular bytecode representation and then interprets that, which is both simpler and more efficient than interpreting the full Python AST. That's quite different, of course, from a full-blown JIT like PyPy that translates even farther to actual machine instructions. While plain interpreters are common in CS homework, they are pretty rare in the real world—bytecode interpreters with a first-stage JIT compiler are much more common.

Anatomy of a JIT

Here's what you need to make a basic JIT:

An interpreter (or another "cheap" way to run the code, like a simple non-optimizing compiler). Your JIT will start the lifecycle by interpreting the code and collecting basic statistics, like the number of times a function or loop body runs.
A heuristic to detect "hotness." When your JIT finds that a function or loop or something is hot (i.e., seems to run a lot), it's time to do more optimization.
A profiler. When you decide that some code is worth optimizing farther, you can afford to spend some time collecting more information about its execution that you can use during optimization. This is usually a different mode in your interpreter that runs more slowly but collects more detailed information, like the types of values and the directions of every branch.
An optimizing compiler. This component can be just like any AOT compiler, except that it also gets to use the information from the profiling step. Usually, though, you will want to generate code that checks that the assumptions from the profiler still hold: that the types are still the same as during profiling, for example. These checks are often called guards.
Ways to switch between interpretation and execution of compiled code.
- Your JIT needs to detect when the interpreter reaches a function that has previously been compiled, for example, and call that pre-compiled code instead.
- And when compiled code's guards fail (when the compiled code relies on assumptions that eventually do not hold), the JIT needs to switch back to interpretation. Falling back from the compiler to the interpreter is called deoptimization.
- Switching between the two modes of execution requires somehow mapping between the different formats they use for program state in a process called on-stack replacement if it can happen in the middle of a function execution.

Refining things further, there are two main strategies for JIT compilation:

Method JITs are the "normal" kind: they compile a function at a time, once they determine that the function has become hot enough. (I don't really know how these got that name instead of "function JIT," which would be much more sensible. But I think it's probably because of how closely associated JIT compilation and Java, which insists on always calling them "methods.")
Tracing JITs are very different: they extract possibly-interprocedural hot paths through the program's control flow and compile those.

One final dimension in the JIT design space is tiering. You can think of the interpreter and the compiler in the basic recipe above as different optimization tiers: one that's fast to get started with but slow to execute, and one that's slower to start up but makes actual execution go much faster. To further mitigate the trade-off between startup (compilation) time and steady-state (execution) performance, modern JITs often use many more than two tiers: an interpreter and a whole suite of different compilers, from quick and dirty to slow and awesome. For example, see this fairly old blog post from the JavaScriptCore developers that breaks down the three tiers they had in 2014 and motivated the addition of their fourth tier.

A Bit More on Tracing

As an example, let's try tracing this program:

function main(x) {
  y = x + 1;
  if (x < 100) {
    z = f(y);
  } else {
    z = g(y);
  }
  return z;
}

function f(a) {
  return a - 1;
}

Don't worry about g; we won't use it in this walkthrough.

Imagine that we have decided that the main function is hot enough to warrant compiling. The general JIT idea is that we'll execute it once and emit some code that optimizes for the case that future executions will be similar to the one we observe. In a tracing JIT specifically, the plan is to produce code for the single path through the CFG that the execution takes for that given input, rather than trying to compile the entire function.

So say that we run main(42). Then as we trace execution, we will first run:

y = x + 1;

And we start building our trace by emitting exactly the same code. We next need to trace the condition, x < 100. Because the condition is true in this execution, we want to emit a trace that assumes that it will be true—i.e., it only includes the code for the "then" branch of this if. But we also want our emitted trace to check that this assumption still holds. To do this, we'll emit a special guard statement in our trace that "bails out" to deoptimization if the condition does not hold. So our trace after this step looks like:

y = x + 1;
guard(x < 100);

We can continue tracing in the appropriate branch of the if. Let's try an intraprocedural style first—we can just copy and paste the code as we execute it, appending it to the end of the trace:

y = x + 1;
guard(x < 100);
z = f(y);
return z;

This completes the trace from the entry of main. In a proper JIT, we could then compile this straight-line (modulo guards) code to machine instructions. Whenever execution reaches the top of main in the future, we can jump to this code instead of interpreting the function. If the guard fails, we will need to deoptimize—returning to slower, correct execution.

In this trace, we "stepped over" the invocation of f—but an even more convenient thing to do is to trace interprocedurally, following the control flow through the call. That would produce an even simpler trace like this:

y = x + 1;
guard(x < 100);
a = y;
z = a - 1;
return z;

The nice thing about tracing is that it produces straight-line code that can be very amenable to optimization. This code doesn't modify variables (it happens to be in SSA form), for example, so it's safe to reorder things—importantly, we can put the guard up front to fail fast:

guard(x < 100);
y = x + 1;
a = y;
z = a - 1;
return z;

Then a simple LVN pass with canonicalization can clean things up quite a bit:

guard(x < 100);
return x;

So in the case where this trace applies, i.e., x is less than 100, we can eliminate code that would be necessary in a more straightforward compilation.

Tasks

The task is to implement a trace-based speculative optimizer for Bril. You'll implement the same concept as in a tracing JIT, but in a profile-guided AOT setting: profiling, transformation, and execution will be distinct phases. The idea is to implement the "heavy lifting" for a trace-based JIT without needing all the scaffolding that a complete JIT requires, such as on-stack replacement.

Concretely, there are two main phases:

Modify the reference interpreter to produce traces.
Build an optimizer that injects traces back into the original program using the speculation extension to provide a "fast path."

Start by reading the documentation for the speculation extension (and watch the video!). That should give you an idea of what's required to augment a program with a speculative execution of an extracted trace. Then make a plan for how you'll hack the interpreter to produce one of those traces.

Here's a recipe:

Start interpreting normally.
At some point during execution (at the very beginning of main, for example, or when reaching a backedge), start tracing.
While tracing, record every instruction as it executes. Eliminate jumps; replace branches with guard instructions. Feel free to do the interprocedural version, and to bail out on any other instruction you don't want to handle.
Stop tracing at some point (after a fixed number of instructions, for example, or at the next backedge) and save the trace to a file.
For bonus "points," statically optimize the trace by eliminating instructions that depend on foregone conclusions enforced by guards.
Transform the program to stitch the trace back into the program using speculate and commit instructions.
Convince yourself that your tracing is correct by checking that the program does the same thing as the non-speculative version. Ideally, it should also execute fewer instructions.