Lecture 22: Memory Allocation Strategies

Administrivia

Evaluator for Prelim is now on the web. Review session will be section on Monday. PS5B will be out later today.

Circular datastructures are actually not needed for mutual recursion (just delays…) But they are needed for, e.g., airline routes.

How computers actually work

Memory and pointers and indirection

Byte versus word addressing (Pentium is 32 bit, byte addressed).

Alignment and word operations versus byte operations (word operations are more efficient)

The memory hierarchy – pretending we have a large fast memory when in fact we have a small fast memory and a large slow memory. This happens at many levels (about 4 times within a Pentium, then main memory, then in the disk drive, etc.) Importance of locality, spatially and temporally.

SML runtime only has 4 types: integers, pointers, records and strings. A field is either a pointer or an integer. We determine the difference by using the low order bit, since pointers are word-aligned.

Note that we will assume that given a pointer it is easy to determine the type of the object it points to. How can we do this? The easy way is to use the high-order bits (why not the low order?) This scheme is called BIBOP.

How lists, refs and tuples might be implemented

SML doesn’t use BIBOP (you will see why shortly); instead, a record is represented as [Record type tag,length,elt1,elt2,…] In the actual implementation, tag and length are compressed into a single 32-bit word.

Strings are usually 1 byte and null-terminated. Quick overview of string operations. SML strings, unlike C strings, are word-aligned. “Wide” strings and Unicode (source of many bugs!)

The list [6,9,42] will be [CONS,6,ptr] [CONS,9,ptr] [CONS,42,ptr]. Note that this allows you to easily create new lists that share structure with old lists (for example, consider what x::l does…)

A ref is just a memory location holding a pointer

A tuple is a sequence of memory locations (there is some freedom about how to represent a nested tuple).

How SML environments might be implemented

An environment might be an array of known size, where each element is a string pointer (to the name) and a value (which may in turn be a pointer). An actual compiler will get rid of the names and replace them with offsets.

Next topic: memory allocation

A critical issue in terms of performance is memory management: efficient use of memory. To do this really right requires understanding a lot about the particular hardware you will run on, but there are some general purpose issues. One of the key ones is locality.

The run-time system for SML needs to be able to allocate memory. So far we have pretended that there is an infinite set of values. Obviously, we can’t allocated an array of size 10^100… The flip side of this is that we also need to be able to deallocate memory – reclaim it when it is no longer in use. These two topics will occupy the next few lectures.

More generally, memory management is concerned with issues like:

· finding memory for a new variable or value

· avoiding putting two values in the same place

· avoiding leaving memory unused

· reusing memory if the value stored there can no longer be accessed

Heap allocation versus stack allocation

Sometimes you can tell that a variable won’t be used after the current procedure call returns. Values like this are “stack allocated”, which essentially means that they are created to automatically disappear on the return.

Anything that might persist longer is “heap allocated”. Note that the memory heap is not the same as a heap datastructure (as used in a priority queue).

How do we do heap allocation (and deallocation)? Explicit allocation (malloc) and deallocation (free) this lecture, automatic deallocation (GC) next lecture.

Allocation simple case: fixed sized blocks of storage. Create a freelist of unused blocks of memory in a linked list. First word of the block has a pointer to the next element in the freelist. On malloc, set free pointer to where this block points. On free, cons this block in front of what used to be the freelist.

Problem: fixed size blocks don’t work. Inefficient when what you need is smaller, impossible when it is bigger. Solution: variable sized blocks.

The obvious solution is to stick in the block its size. Then we look down the freelist for something bigger than what we need, and when we get there we return the size of memory we need, and replace this in the freelist by the remainder.

Problem: worst case we walk the entire list. And we can potentially get lots of small blocks (aka external fragmentation (internal is boring)).

Or you could have different freelists for different sizes. Note that if we use BIBOP we essentially need one freelist per type. This is a total pain! Your program can run out of, e.g., floating point numbers but still have tons of room for Booleans, etc. This is part of why SML doesn’t use BIBOP. In fact, as this example suggests, a sufficiently performance-intensive application always has its own memory allocator.

Popular allocation method: buddy system. In the standard buddy system (there is also a fibonacci buddy system), there are fixed size blocks for powers of 2. The actual invariant is that you need to have a set of fixed sizes, where you can split a bigger block into two blocks of smaller sizes.

If you are out of blocks of size 4, you can split a block of size 8 in two; one will be returned, the other added to the freelist of size 4. Furthermore, when a block is freed if it is right next to a block of the same size (its “buddy”) the two of them can be merged to create a larger block. There is still a cost in terms of internal fragmentation because the free block sizes are “sparse”. This is an advantage of the fibonacci buddy system.

Explicit vs. automatic garbage collection

There are two basic strategies for dealing with garbage: explicit garbage collection by the programmer, and automatic garbage collection built into the language run-time system. Explicit garbage collection is provided by languages like C and C++. There is a way to explicitly deallocate (or "free") allocated memory when it is expected that that memory is about to become garbage. Languages like Java and SML provide automatic garbage collection : the system automatically identifies blocks of memory that can never be used again by the program, and reclaims their space for use by later allocations.

Automatic garbage collection offers the advantage that the programmer does not have to worry about when to deallocate a given block of memory. In languages like C the need to explicitly manage memory complicates any code that allocates data on the heap, and is a significant burden on the programmer. Worse, if the programmer fails to deallocate properly, bugs are introduced into the program that are hard to find:

If the programmer neglects to deallocate some garbage, it creates a memory leaks in which some allocated memory can never again be reused. This is a program for long-running programs which will tends to grow in size until they consume all of memory.
If the programmer is too aggressive and deallocates a block of memory that is still in use, this creates a dangling pointer that may be followed later even though it now points to unallocated memory or to a new allocated value that may be of a different type.
If a block of memory is deallocated twice, this typically corrupts the memory heap data structure even if the block was initially garbage. Corruption of the memory heap is likely to cause unpredictable effects later during execution and be difficult to debug.

In practice, programmers manage explicit allocation and deallocation by keeping track of what piece of code "owns" each pointer in the system. That piece of code is responsible for deallocating the pointer later. The tracking of pointer ownership shows up in the specifications of code that manipulates pointers, complicating specification, and use, and implementation of the abstraction.

Automatic garbage collection helps modular programming, because two modules can share a value without having to agree on which module is responsible for deallocating it. The details of how boxed values will be managed does not pollute the interfaces in the system.