Evaluator for Prelim is now on the web. Review session will be section on Monday. PS5B will be out later today.
Circular datastructures are actually not needed for mutual recursion (just delays…) But they are needed for, e.g., airline routes.
Memory and pointers and indirection
Byte versus word addressing (Pentium is 32 bit, byte addressed).
Alignment and word operations versus byte operations (word operations are more efficient)
The memory hierarchy – pretending we have a large fast memory when in fact we have a small fast memory and a large slow memory. This happens at many levels (about 4 times within a Pentium, then main memory, then in the disk drive, etc.) Importance of locality, spatially and temporally.
SML runtime only has 4 types: integers, pointers, records and strings. A field is either a pointer or an integer. We determine the difference by using the low order bit, since pointers are word-aligned.
Note that we will assume that given a pointer it is easy to determine the type of the object it points to. How can we do this? The easy way is to use the high-order bits (why not the low order?) This scheme is called BIBOP.
SML doesn’t use BIBOP (why?); instead, a record is represented as [Record type tag,length,elt1,elt2,…] In the actual implementation, tag and length are compressed into a single 32-bit word.
In SML the types are pretty much known at compile time, which is why programs that compile tend to run (this is a very useful property). So compare for example the int 0 and the bool false. In reality, most variables would also store somehow their associate types. But we will ignore this issue for simplicity and just assume that the types of all variables are known and represented “somewhere”. We will also ignore parameterized types; they can be handled, but they introduce additional complexity.
Strings are usually 1 byte and null-terminated. Quick overview of string operations. SML strings, unlike C strings, are word-aligned. “Wide” strings and Unicode (source of many bugs!)
The list [6,9,42] will be [CONS,6,ptr] [CONS,9,ptr] [CONS,42,ptr]. Note that this allows you to easily create new lists that share structure with old lists (for example, consider what x::l does…)
A ref is just a memory location holding a pointer
A tuple is a sequence of memory locations (there is some freedom about how to represent a nested tuple).
An environment might be an array of known size, where each element is a string pointer (to the name) and a value (which may in turn be a pointer). An actual compiler will get rid of the names and replace them with offsets.
A critical issue in terms of performance is memory management: efficient use of memory. To do this really right requires understanding a lot about the particular hardware you will run on, but there are some general purpose issues. One of the key ones is locality.
The run-time system for SML needs to be able to allocate memory. So far we have pretended that there is an infinite set of values. Obviously, we can’t allocated an array of size 10^100… The flip side of this is that we also need to be able to deallocate memory – reclaim it when it is no longer in use. These two topics will occupy the next few lectures.
More generally, memory management is concerned with issues like:
· finding memory for a new variable or value
· avoiding putting two values in the same place
· avoiding leaving memory unused
· reusing memory if the value stored there can no longer be accessed