Lesson 11: Memory Management

discussion thread
video
tasks due October 26

Gist

A silly tweet by Steve Blackburn, who is one of the world’s preeminent GC researchers.
The calamitous failure of manual memory management.
- See, for example, this finding from Microsoft that 70% of their CVEs are from memory safety bugs.
- The vast majority of languages today have either (a) manual memory management, or (b) automatic, dynamic memory management. Automatic, static memory management is mostly the domain of Rust.
Some terminology:
- collector: Short for garbage collector. The runtime system that calls free for you.
- mutator: The actual program you’re trying to run, whose garbage is being collected.
- live: Will be accessed again in the future.
- dead: Otherwise.
Approaches to garbage collection.
- Reference counting vs. mark/sweep a.k.a. tracing (sometimes just called “garbage collection” unto itself, but that’s kind of confusing).
  - What metadata do they maintain, when do they run, what do they do, and what do they do when they run?
  - Reference cycles.
- Refinements under the mark/sweep umbrella.
  - Conservative GC.
    - Its applicability to memory-unsafe languages like C/C++, where the compiler does not necessarily know which values are pointers.
    - A cool blog post about Riptide, the GC in Apple’s JavaScript compiler, that mentions that it uses a conservative GC. In fact, counterintuitively, lots of implementations of safe languages use conservative GCs—try to think of reasons why this is the case!
    - The wasted memory seems bad, but “Bounding Space Usage of Conservative Garbage Collectors” by Hans-J. Boehm shows how data structures often don’t have such a big problem.
    - A thread on the Go mailing list demonstrates why conservative GC often works much better on 64-bit than 32-bit architectures: because valid pointers are far more sparse in the space of all integers.
  - Parallel collectors use multithreaded tracing algorithms. (Useful but not that interesting.)
  - Some aspects of “when” the GC runs:
    - Concurrent collection: the mutator runs in parallel with the collector.
    - Incremental collection.
  - Moving/copying/compacting.
    - Semispace.
    - Generational.
      - The “generational hypothesis”: pithily, that most objects die young. More usefully, when a given GC run occurs, recently-allocated objects are more likely to be unreachable than objects that have already lasted a long time.

Tasks

Implement a garbage collector for a Bril interpreter and the Bril memory extension, eliminating the need for the free instruction.
- Most people will want to start with the reference interpreter, which is written in TypeScript. But if you like Rust a lot, you might consider modifying brilirs instead.
- Stick with something simple, like reference counting or a semispace collector. Because Bril lacks objects/structs, it’s not really that meaningful to implement anything much fancier than that.
Check that it works by running the benchmarks that use memory after deleting all their free instructions.
- First, check that the programs still work (and produce the same answers) when you delete free.
- Second, add some tracking to your implementation to be sure that, when the program finishes, all the garbage that should be collected is collected. For example:
  - If you implement typical reference counting: After you finish returning from any function, the RC scheme should have freed anything that is only reachable from that stack frame (except for cycles). So when you return from main, everything should be freed (again, modulo cycles). You can check that this is true and print a warning about memory leaks if it’s not.
  - If you implement a typical tracing scheme: You’ll probably choose some heuristic to determine how often GC runs. Consider amending the heuristic to run one more time after main returns. After this final GC run, all the garbage should be collected. Consider printing an error at this point if there is a nonempty heap at this point.