MCC: A Mostly Copying Garbage Collector

Frederick Smith
fms@cs.cornell.edu
Greg Morrisett
jgm@cs.cornell.edu
This page assumes familiarity with mark-sweep, and copying garbage collection at a minimum. Some idea of how conservative garbage collection works would also be helpful. I highly recommend Paul Wilson's survey as an introduction to garbage collection. A link to it is given here.

Source Code

The source code for MCC 1.0 is availabe here as a small (34K) tarred and gzipped file.

Motivation

Mostly-copying garbage collection is a variant of copying collection designed to perform well in environments where exact information is known for most of the objects, but not all of the objects in the heap. This is the case for many high-level language compilers that compile to C. Although exact type information is known at the source level for all the objects, once compiled to C, type information for local objects is lost since the C compiler controls the stack layout of the data. However, all the type information for heap allocated objects can still be retained.

There are a several solutions to this problem. Some compilers attempt to do their own stack manipulation, but usually significantly reduce the amount of optimization the C compiler can do. Others, put all objects in the heap. This unnecessarily increases the pressure on the garbage collector. Yet other compilers use completely conservative collection. The Boehm-Demers-Weiser Collector is the most commonly used collector of this genre.

Mostly-copying collection is a third possibility. It does a conservative scan of the stack, and conservative portions of the heap (linked in code), but does exact copying collection on the rest of the data.

The purpose of our reserach is to weigh the benefits of this approach versus the others. In particular we are interested in how MCC compares to the Boehm-Demers-Weiser collector. To this end we have modified TIL (from Cornell and Carnegie Mellon Universities), an optimizng ML compiler to use a C-backend which we have then targeted to both Boehm-Demers-Weiser and MCC. The results show that MCC performs better although it requires more space. Large cache effects and different trade-offs made by the two collectors complicate this simple picture significantly. All these details are explained in our paper.

In the future it would be interesting to try this same experiment for Java to see what impact the language has on allocation and hence on garbage collection.

Bartlett et al. have written a C++ mostly-copying collector that has a few algorithmic differences from ours. Follow this link for my explanation of Bartlett's algorithm. They have shown that this type of collector can be made generational, as well as concurrent. We hope to extend our base system sometime in the future to include these features as well. Unfortunately very few performance numbers are available for Bartlett's collector.

A somewhat outdated description of our algorithm can be found here.

Related Work/References

  1. The Boehm-Weiser Collector
  2. The Customizable Memory Manager
  3. Bartlett's Mostly-Copying Garbage Collector
  4. Paul Wilson's garbage collection ftp archive and GC survey(~379Kb).