The Algorithm

Warning: This page is under construction.

The algorithm I am going to describe below can be thought of as a mixed mark-sweep and copying collector. Mark-sweep is used for large objects and objects that cannot be moved because they may be pointed to from a conservative object, and copying collection is used for the remaining objects.

The heap is broken up into multiple pages of fixed size. Associated with each page is a PageInfo structure that serves to keep pages in linked lists, and store some state bits. Small objects(< 1/2 page) are allocated on pages and do not overlap page boundaries. Large objects(>= 1/2 page) are allocated to their own set of pages. Associated with each object is a header word. The header words maintains two bits of state which record whether the object is "normal", pinned(cannot be moved), traversed(marked), or forwarded. It also records the size, type, and bitmask for the object.

Before a garbage collection, every page is in fromSpace, the system has a queue of all the C-Objects within the heap, and all objects are unpinned and untraversed. When a new C-Object is allocated it must be added to the C-Object queue. The garbage collection then proceeds in the following phases.

Phase I Pin all "children" of C-Objects within the heap. This assures that the objects will not be mistakenly moved.
Phase II Scan the stack, registers, and static area, pinning and marking all objects pointed to from these. The marked objects get added to a queue(the ExplicitQueue) to be scanned. All the pages where the marked objects occur are going to be kept so we pin all other objects on that page to avoid copying them unecessarily. The page is also "pinned" by setting the pinned bit in its PageInfo structure.
Phase III Arbitrarily select elements from the ExplicitQueue or Cheney's queue(ImplicitQueue) until no elements remain, performing the following operations on each element(object). For each object process each of its children. If the child is pinned but not marked, mark it and add it to the explicitQueue. Also pin the page it is on and all other objects on the page. If the child is not marked, and not forwarded, forward it and add it to the ImplicitQueue. If the child is already forwarded update the pointer to the child's new location. If the child is a C-Object that has not been marked or forwarded add it to the new list of C-Objects to use after this collection.
Phase IV Move all pinned pages into toSpace by resetting their bits appropriately, and moving them into the toSpace list. Unpin all the pinned and traversed objects(i.e. all the pinned objects that survived the collection). Then flip toSpace to fromSpace.

Analysis

This algorithm need only do one pass over the live data. But it does two passes over the children of C-Objects. Once to pin them, and once to unpin them. In addition since the algorithm maintains two state bits with each object it is a bit harder to modify for a BiBoP(Big Bag of Pages) approach.

An Allocation:

The main advantage of copying collection is that allocation is very fast. Our experiments suggest that this may not be as big an advantage as we once thought. In any case, allocation for MCC is very fast. We store the allocation pointer in a register. We obviate the need for a limit pointer since the end of the page is aligned on the size of the page. So it suffices to test allocPtr+size & PAGE_MASK > allocPtr to determine whether an object fits on a page. Below is the commented SPARC assembly code for allocating a cons-cell(a record of size 2).

	add %g7,12,%o1                Add the size to allocPtr
	and %o1,-2048,%o0             And with PAGE_MASK
	cmp %o0,%g7                   Do the comparison
	bleu .LL3                     Branch if less than or equal
	mov 2113,%o0*                 Load the header word.
	        call gcAllocSpace,0
Uncommon        nop
Case	        add %g7,12,%o1
                mov 2113,%o0
.LL3:
	st %o0,[%g7]                  Store the header word.
	mov %o1,%g7                   Update allocPtr
* This instruction is in the delay slot and gets executed whether the branch is taken or not.

A Garbage Collection:

This table describes the main data structures used in the description below.
AbbreviationFull NameDescription
CO C-Object Queue This is a queue of all conservative objects alive after the last collection, and allocated since.
EQ Explicit Queue This is the queue used for processing marked objects that are not copied.
IQ Implicit Queue This is the Cheney queue used to enqueue all the objects that are being copied. Implicit refers to the fact that this queue does not need any additional system pages.
TO Traversed Object Queue This is a queue of all the objects that have been traversed(marked). In the actual implementation we use EQ for this purpose as well but it simplifies the exposition if we maintain this as a separate data structure.
N/A From-Space The linked list of pages that are in use at the start of collection.
N/A To-Space The linked list of pages that are alive at the end of the collection. This is in fact the same list of pages associated with the implicit queue.
    Initial conditions:
  1. All pages are either User, System, or Free pages.
  2. No objects are pinned.
  3. The Queue of C-Objects, CO, contains all the conservative objects alive at the end of the last collection, and allocated since then.
    A Collection:
  1. Pin all children of elements of CO. At this point CO is emptied so that it can be filled with the surviving C-Objects.
  2. Scan the Stack (pinning and traversing) all the objects found and adding them to EQ and TO. Also pin the pages these objects reside on.
  3. While (EQ is not empty) and (IQ is not empty)
    1. if(EQ is not empty) x = dequeue(EQ)
    2. else x = dequeue(IQ)
    3. processObject(x)
  4. Unpin all the objects in TO.
  5. Process all pages p in From-Space
    1. if(p is pinned)
      Fixup p (make sure there are no misleading headers)
      Add p to the To-Space linked list of pages.
    2. else free p
  6. Flip all pages in To-Space to From-Space.

processObject(x)

x's children have not been processed although x is either traversed, or already in To-Space.
  1. if (x is a C-Object) return processCObject(x,h)
  2. for each child c of x
    1. if (c is forwarded)
      update x's pointer to c
    2. else if(c is not pinned)
      enqueue(IQ, c)
      forward c
      update x's pointer to c
    3. else if(c is not traversed)
      enqueue(EQ', c)
      enqueue(TO, c)
      pin the page c resides on
      mark c as traversed

processCObject(x, header)

  1. enqueue(CO, x)
  2. For each word of the object x that seems to be a valid pointer(p)
    1. if (p is not traversed)
      enqueue(EQ, p)
      enqueue(TO, p)
      mark p as traversed
      pin the pages p resides on

Technical Difficulties:

Finding the stack limits.
The easiest way to find the stack bottom is to use inline assembly to get the stack pointer. To find the stack top just call a dummy function and again use inline assembly. If inline assembly is not possible the best alternative is to just take the address of a local variable, and estimate where the top or bottom should be from that.

Finding the static area limits.
The top of the static area is easy since the compiler exposes this to the programmer through the variable end or maybe _end. Finding the other side is not as easy. The easiest way, and one of the ways used in the Boehm-Weiser collecter, is to scan down from the end you do have, until you get a memory fault. This is surprisingly fast and effective. After all you only have to do it once.

Scanning the registers.
On many machines calling setjmp is enough to push the registers on the stack where the normal scan will pick them up. However for machines like the SPARC with register windows, some assembly code is needed to flush all the register windows. Solaris (what I am currently using) provides a trap(0x3) ST_FLUSH_WINDOWS for exactly this purpose.