Lesson 14: Fast Compilers

Flattening ASTs (and Other Compiler Data Structures)

Gist

The motivation for fast compilers.
- The motivation in dynamic compilers is probably self evident.
- In AOT compilers, I personally think compiler performance matters in two regimes:
  - For small projects, it affects programmer productivity via the compile-edit-run cycle. (Can you compare how you felt while programming in (for example) Scala or C++ vs. Go or OCaml?)
  - For large projects, compilation time can be a huge resource expenditure. If it takes multiple hours to rebuild your main product, can you think of “business” reasons why a company might care about compiler efficiency?
The ingredients in compiler performance.
- Asymptotic complexity of the algorithms.
- Host-language efficiency taxes.
- Complexity of the source language’s grammar (people love to focus on this one).
- Data structures.
Some evidence-free generalities about compiler performance.
- The terrible thing is that compilers tend to have a flat profile: there is no single performance bottleneck. Making a compiler go faster usually requires a huge number of small optimizations.
  - See, for example, Nicholas Nethercote’s series of blog posts titled “How to speed up the Rust compiler in…”, which illustrate some of these small wins in a production compiler.
- This is related to why compilers are so often single-threaded, and why there are almost no (but not zero) GPU-accelerated compilers.
- It is therefore a good idea to proactively focus on performance at the design stage (even though this is “the root of all evil.”)
Let’s focus on data structures.
- One reason: It’s a cross-cutting way to address performance across a flat profile.
- We will learn about one extremely simple trick that you can apply in many ways. It is definitely not the only ingredient in a fast compiler, nor is it even clearly the most important one. But it is probably an important one.
- Proceed to this blog post about data structure flattening.
  - Construct a simple interpreter, and try it out.
  - Introduce a flat representation, and adjust the interp function accordingly.
  - Discuss:
    - Why should this change be good for performance?
    - Are there any implications for ergonomics?
  - A little performance evaluation.
    - A random program generator, so we can skip I/O.
    - I endorse Hyperfine for quick-and-dirty measurements like this: hyperfine -w1 -r3 './target/release/flatcalc gen_interp 1234'
    - I endorse samply as a simple way to see where the time goes: samply record --no-open -- ./target/profiling/flatcalc gen_interp 1234
  - Use my bench.sh to reproduce a full set of results.
- Consider a different kind of interpreter that exploits the flat representation. Reflect on how we have essentially reinvented the idea of a bytecode interpreter.

Tasks

There are no tasks for this lesson. Good luck on your course project!

This is the last lesson of CS 6120. If you’re taking the self-guided version of the course, please fill out this feedback form. I would love to know who has taken the class—and I want your opinions about how to make it better!