Lecture 23:
Memory Organization and Garbage Collection

In our previous discussion on the environment model we have encountered many examples of allocated memory space that became unreachable by programs. "Unreachable" memory blocks are those allocated memory areas that we can not reach directly or indirectly following pointers available in the current environment. We will call the set of pointers available in the current environment "roots."

Tuples, records, strings, arrays, custom, datatypes, closures, can all become garbage in an SML program. In time, accumulating garbage will fill up the entire available memory, so some procedure must be put in place to remove it.

We can think of the memory layout of a program as being a graph: the nodes consist of the allocated memory blocks, while the (directed) edges are represented by pointers. Finding the reachable memory blocks is equivalent to starting in the set of roots and traversing the graph following edges to find all reachable nodes.

Memory Organization

Until now we have not worried about hardware-specific details of memory organization, rather, we have treated memory as an otherwise uspecified collection of abstract memory locations.

The typical memory of a modern PC-class computer consists of a sequential array of few hundred million to a few billion memory locations ("bytes," which are groups of 8 bits). A group of four locations (sometimes only 2, but sometimes 8, depending on the computer architecture) forms a memory word. The index giving the position of the memory location in this array is known as its address.

The memory space allocated to each program is divided into three parts: the stack, the heap, and the code area (see image below).

The code region stores programs in machine code that the processor understands and can execute directly.

The stack is used to store bindings for all short-lived variables in every function call (note that the boxed values in the environment model are refered to by pointers in the bindings, but are not part of the bindings themselves). The stack behaves very much like the abstract datastructure we have talked about in class earlier: data that is pushed onto it last is removed first.

Certain values are long-lived, and they must survive the end of the function call that created them. These values are stored in the heap. (Note: the memory heap and the binary heap we have studied earlier are completely unrelated.) The heap is broken up into memory blocks of various sizes, some allocated, some free. Special datastructures are maintained to manage the allocation and deallocation of these memory blocks; for example, the set of available (or free) blocks is typically organized as a linked list (the freelist). Since blocks have different sizes, the size of each block must be represented somewhere in the heap datastructures.

Determining which variables are short-lived, and which are not, is not trivial. Consider the example below:

fun giveOutRef(): (int -> int) ref = 
    val h: (int -> int) ref = ref (fn _:int => 1)
    val () = let
	       val x: int = 3
	       fun f(n: int):int = if n = 0 then x
					    else n * (f (n - 1))
	       h := f

This example illustrates several points: first, it is not possible to always simply remove bindings created in a let statement when the evaluation of the respective statement ended. If we examine the inner let statement, we see that neither the bindings for x, nor for f can be removed after the statement is evaluated because a direct or indirect reference to them has been handed out through h. When function givenOutRef terminates, the binding for h itself can be removed, but because the reference held in h was given out, the bindings for x and f must still be maintained, as they could be needed later. All these clearly imply that the environment model can not be implemented with the help of a pure stack even for values (like the binding for x) that could otherwise fit in a single word. Long-lived values must be allocated onto the heap.

When a request for memory allocation is made, the freelist is searched for an available block of memory. Any available block at least as large as the requested amount of memory will do, but different allocation strategies lead to very different behavior.

The first fit allocation strategy allocates memory from the first free block big enough to satisfy the allocation request (the leftover space will become a smaller free block available for subsequent memory allocation requests). This strategy, while simple, is prone to quick fragmentation (see below, but also the discussion here).

An alternative strategy is to look for the best fit, i.e. the smallest block greater or equal in size to the requested amount. This strategy decreases the degree of fragmentation compared to the first fit strategy, but it also imposes more overhead, since we have to examine the entire freelist to determine the best fit. In practice we might stop if we found a fit that is "close enough."

Searches in the freelist can be sped up if the system maintains several lists of free memory blocks, organized by size. A version of this idea is implemented by the buddy system, in which memory blocks can only have sizes that are powers of two. The size requested is rounded up to the next allowable size, then a search is undertaken to find a suitable block in the list of freeblocks of the respective size. If no free block is available, a block of the immediately larger size is removed from the freelist of immediately larger size, and split in two (if no block of immediately larger size is available, we go two sizes up, or as many sized up as necessary to find an available block). One block is allocated to satisfy the original request, the second block is added to the freelist corresponding to its size.

Memory deallocation has its problems as well. A block that is deallocated can simply be added to the freelist. Alternatively, it is possible to examine the freelist to establish whether there are adjacent free blocks to the block that has been freed. If there are, adjacent free blocks can be merged into a single - larger - free block.

Depending on the sequence of memory allocations and deallocations, it is possible to have enough free memory available, but not to be able to allocate a contiguos memory block of a certain size. Indeed, consider the situation represented in the image below:

We have represented above a memory region of size 30 which has been allocated to four blocks, of sizes, 10, 5, 10, 5, respectively. Assume now that the two blocks of size 5 are freed. If we now want to allocate a memory block of size greater than 5, say, 7, we can not do it. There is enough free memory, but it is fragmented. We often deal with fragmentation using compacting garbage collectors.

Return to the specifics of SML, we must set up a set of rules that map the abstract objects that we handle to specific memory layouts. For example, we will represent all simple values (e.g. integers, booleans) using a single memory word; we will do the same for references (pointers). A tuple having n fields will be represented as a sequence of n memory words. Similar rules can be defined for all SML values. Once these rules are worked out, the size and layouts of the blocks needed to store SML values can be determined precisely.

Explicit (or Manual) vs. Automatic Garbage Collection

The operations relevant to memory management are memory allocation and deallocation.

Memory allocation is controlled by the programmer, either implicitly (e.g. by declaring new variables) , or explicitly, through the issuance of memory allocation commands. Programmers familiar with C will recognize the code fragment below as typical of explicit memory management:

char *buffer = NULL;
/* attempt to allocate BUFFER_SIZE characters. */
buffer = (char *)malloc(BUFFER_SIZE * sizeof(char));
if(buffer == NULL) {
  /* memory allocation failed */
} else {
  /* memory allocation succeeded */

Note: If you do not know C, you do not need to worry about understanding the details of the code above. The important point is that we need to know explicitly how big of a memory reqion we want to allocate, and we need to issue an explicit command for the allocation to occur. As a result of a successful request we get back a pointer to the newly allocated memory block (buffer). In a correct program we need to explicitly deal both with the situation when the memory request succeeds, and when it fails.

Manual memory management is very flexible, and it allows the programmer to optimize the sequence of memory allocation/deallocation commands, and to adapt these to the semantics of the application at hand. A careful examination of an application's memory requirements often allows, for example, for the reuse of allocated memory blocks at different stages of the program. Such reuse reduces the number of allocation/deallocation commands, and can lead to significant speedup.

The downside of flexibility and (potential) efficiency of manual memory management is the very fact that the programmer must deal explicitly with low-level details, and has to respect a strict discipline in order to avoid the otherwise all-too-common memory management pitfals.

Once allocated, a memory region is often passed around (made accessible) to several parts of the program. In practice, programmers often assign "ownership" of a memory region to certain parts of the program, and they (try to) carefully keep track of this information. The code that owns a memory region will then be responsible for deallocating it. Defining and keeping track of such ownership is often a tedious task that complicates program specification and makes implementation challenging.

Errors in manual memory management can occur both at the time of allocation (for example, by allocating too little memory) and at the time of deallocation. We will concentrate on deallocation errors, since our focus is now garbage collection:

To summarize, manual memory management is flexible, can be efficient, but it is error prone, especially with respect to memory deallocation (garbage collection). Given the speed and large memory sizes common in modern computers, the need to manually optimize memory management has dimished significantly in the case of the most common applications. Meanwhile the relative cost of identifying and debugging memory management problems has increased. These developments have motivated the spread of automatic garbage collection, which we encounter in programming languages as different as SML and Java.

Automatic Garbage Collection

We list below the essential properties that an automated garbage collector should have:

The last point in this enumeration merits further discussion. If garbage collection occurs at unpredictable moments of time, and the program is stopped or significantly slowed down for an extended period of time, then automated garbage collection can not be used in certain critical applications. Think, for example, of a program flying an airplane, which has to react to events within, say, a tenth of a second. If a garbage-collection round took three seconds to complete, the airplane would be in great trouble. We can reduce the disruption caused by garbage collection rounds by carefully choosing the times when garbage collection is triggered, and by employing techniques that reduce the duration of garbage collection rounds.

It might seem intuitive to initiate garbage collection when the program runs out of memory. This, however, is not the best policy. If memory has been completely used up, it is likely that most of it is garbage. Collecting the garbage from the entire memory at once is likely to be slow. Waiting until all the memory is allocated is also unfair to the other programs running on the same computer, as it would starve those programs of memory, too.

Instead of waiting for all memory to be allocated, garbage collectors are invoked periodically. In this context, the trigger for garbage collection rounds could consist of the issuance, after the previous round of garbage collection,

The disruption caused to running programs can also be mitigated by employing incremental garbage collection, i.e. by developing garbage collection algorithms that run in parallel with the program, and perform only a small part of the entire garbage collection algorithm whenever they get the chance to run. An other useful idea is to restrict garbage collection to a small portion of the memory at a time. This is most often done using generational garbage collection.

Identifying Pointers

To determine which memory block are reachable, a garbage collector must accurately identify pointers. If unambiguous pointer identification is not possible, then the garbage collection should never overlook a pointer, since this could lead to the deallocation of memory blocks that are, in fact, in use. The converse problem - erroneously assuming that a certain bit configuration represents a pointer - is less dangerous, since its worst consequence could be not to deallocate blocks that are garbage. A garbage collector that might fail to collect some garbage, but which will never deallocate blocks that are live is called conservative.

One simple strategy of distinguishing a pointer from other values (say, integers) is to allocate one bit of each memory word to a tag. One drawback of this method is that, assuming a typical 32-bit word length, just over 3% of the memory is used up by tags. In addition, only 31 bits are available to represent pointers and other values. This means that the memory range addressable by pointers is halved, similarly to the representable range of integers. In practice, however, this is rarely a limitation, as the available address space is large enough to comfortable hold most programs, while the typical range of integer values we would want to represent in a program is closer to units than to billions.

The compiler (think SML or Java) has full type information available; it could record this information and make it available to the garbage collector. The garbage collector, in turn, could use this information to determine unambiguously which fields of a memory block hold pointers without the need for tagging. This is achieved at the cost of memory overhead needed to represent type information. There is, however, a more subtle issue here as well: in order for the garbage collector and the compiler to work well together, the garbage collector must be aware of the types that the compiler recognizes. This leads to a so called "tight coupling" between the compiler and the garbage collector: the garbage collector must be changed whenever the compiler's type system - or only its internal representation - changes, or whenever the garbage collector is adapted to a new compiler. Ideally, we would like to have garbage collectors that are independent of any particular compiler.

It is possible to build conservative garbage collectors that use heuristics to distinguish between pointers and other values, say, integers. Such heuristics can be built on the observation that typical integers in programs are "small" (often, in the range of units, rarely bigger than millions), while pointers look like large integers. If we assume that every bit configuration that looks like a big integer is in fact a pointer, we can develop a conservative heuristic: ocasionally we will misidentify an integer for a pointer, but we will never misidentify a pointer for an integer. We will only free memory blocks that are not pointed to by any (assumed) pointer. Such a garbage collector does not require any memory overhead for storing tags or type information, it will not reduce the usable range of any value, and it will not require tight integration with the compiler; it will, however, ocasionally keep around garbage that should have been freed.

Mark & Sweep Garbage Collection

Mark-and-sweep is a two-pass garbage collector. In the first pass each memory block is marked if it is reachable, while in the second pass all unmarked memory blocks are freed. One bit per block is sufficient to record reachability. Marking is typically performed by implementing a depth-first search starting from the set of roots.

One problem of mark-and-sweep is that in general it requires O(n) memory space for graph traversal (n is the number of allocated blocks). This amount of memory is needed because we need to "remember" the list nodes that we are currently exploring so that we can return to them later. But if we are almost out of memory how can we find O(n) additional memory?

This problem can be solved by the technique of pointer reversal. The basic idea is we can use the very pointes that represent the edges of the graph in order to trace back our steps in the graph of memory blocks. The image below illustrates this process:

The current memory block is indicated by a red frame, pointers are represented by grey squares, and the blocks they point to are shown by edges of various colors. An edge colored in black represents a "normal" pointer, i.e. a pointer whose value and meaning is defined by the semantics of the datastructure, as set up by the program. A blue edge represents a "return" address (i.e. the address of the block where we have to return after we are done processing block from which the "blue" pointer originates. A dotted blue line represents the pointer that we will follow in the next step, to return to the memory block from which we have arrived to the current block. A green edge represents a pointer that we will follow in the next step to reach the next memory block. You will note that certain pointers seem to hold two values, indicated by two edges (a blue and a green one) originating from them. We chose this solution to simplify the image; in reality the "green" pointer should be stored in a separate variable, before it is overwritten by the "blue" pointer.

The image shows a few intermediate stages of the mark-and-sweep algorithm, and must be read from top to bottom first, then from left to right. By reusing the pointers that are already present in the datastructure itself we can implement a depth-first search a constant amount of additional memory (for example, we need a variable to hold the green pointers before they are overwritten).

Reference Counting

When using reference counting we need to keep track of the number of pointers that refer to a particular memory block. When this number goes down to zero, then the block can be deallocated, as it has become unreachable. The method involves some memory overhead, as a reference counter must be stored in each memory block.

This simple method has a few important drawbacks, however:

Memory Fragmentation and Compacting Garbage Collection

One way to solve the problem of fragmentation is to move the memory blocks so that the allocated memory region is compacted and the free memory is available in one chunk, as shown above; we can then easily allocate the block of length 7.

The relocation of memory blocks is complicated by the fact that the value of all pointers that refer to blocks that are moved must be adjusted. Finding all such pointers is an expensive operation, both in terms of time and additional storage space required.

One possibility of simplifying the relocation of memory blocks is to use an object table. The object table contains pointers to the true location of each object in the memory; however, all objects that want to refer to other objects will points not to the destination directly, but to the corresponding entry into the object table. Since there is only one "true" pointer to each object, and this pointer can be found in the object table, object relocation is greatly simplified. The downside of this method is the double time overhead necessary to follow pointers (we need to follow a pointer into the object table, then another one to the actual object).

Generational Garbage Collection

The useful life of an object (the time before it becomes garbage) can vary greatly. It has been established empirically that the longer an object has existed before present, the longer it is likely (but not absolutely sure!) that the respective object has a long useful life left. In other words, if a piece of data has survived long enough in our programs, it is likely to be important and it will probably be kept for a long period of time in the future as well.

An important consequence of this observation is that a significant proportion of newly created objects is likely to be garbage, while a significant proportion of old objects is likely to consist of live data. It thus make sense to segregate objects based on the time elapsed since their creation. One way to achieve this is to break up the available memory into several contiguous regions, which will be allocated to various object "generations."

When an object is created, memory for it will be allocated in the region for generation 0. When the region for generation i runs out of space, a garbage collection round is initiated, and live data is transferred into the space for generation i+1. There is a last generation, say, k, from which objects do not get promoted further, i.e. live data in generation k is kept in place.

One problem of generational garbage collection is pointer management across generational boundaries. In the absence of side effects, objects in generation i can only refer to objects in generation i+1 (in other words, in the absence of side effects, newer objects can refer to older objects, but not viceversa). Without side effects, the frequent relocation of "younger" generations is less problematic, as the non-moving "older" generations do not contain pointers to new values.

When side effects are allowed (specifically: references) then it becomes possible for pointers from data in generation i to refer to data in generation i-1; i.e. older data can refer to newer data (can you give an example illustrating this point?) Because older data can now point to newer data, the relocation of the newer values can trigger pointer value adjustments in older values as well. Side effects thus complicate generational garbage collection.