CS312 Lecture 25: Memory and datastructures

Environment as Hash Tables

We have studied how an environment is modified by using let and application of functions. These represent the moment where we introduce new bindings into our environment. We will now focus on different approaches to implement an environment for an interpreter:

Approach 1

A first approach to implement environments is to create:

A list of bindings to represent the local environment
A hash table for the global environment.

Then when we would like to look up for a binding, we traverse the list (local environment) and if we don't find anything then we look up into the hash table.

An immediate conclusion is that this implementation will be very inefficient when we have big local environments, as it will behave exactly as using a list to represent the whole environment.

Approach 2

Can we do better than the mixed approach, of having a hash table for the global environment plus a list for the local environment? Here is an interesting variant: suppose that we use a hash table for each frame. Thus, the environment is a list of hash tables. The crucial line in the interpreter is where evalApply calls evaluate, which is where we create the new frame. Currently it uses various ListPair primitives; instead it would need to create a new hash table.

So we create a list of Hash tables for environments. In order to do this we will first define a frame as the set of bindings introduced by a single let or by an evalApply, for instance:

let                        -+
    val x = v1              |  Frame A
    val y = v2             -+
    in
     let val z = v3         ]  Frame B
     in
       let val w = v4       ]  Frame C
       in  (*) ...

Then the environment at the (*) should be [HT C, HT B, HT A, HT Global]. Now, when we look up for a binding, we will first look in the HT C, that will have the bindings from the Frame C. If we don't find it we continue with the next Hash Table (which is like following up to the next Frame).

Of course, there is still a problem with the linear search of hash tables. Suppose that we are deep inside a series of let's (say, 100 of them). Each time we do a lookup of an identifier in the global environment, we have to search 100 hash tables before we get to the global environment's hash table. Moreover, this happens, say, each time we see the identifier "times".

One possibility is to "hash and cache". Each time we do a lookup, we can store (a copy of) the binding in the first local environment. This combination (cache the solution to something that is expensive to compute in a hash table) is actually quite powerful, and is a standard way to speed up many programs. We could also post-process the list of hash tables to move copies of everything into the first local environment. The trade-off of this approach is that if in general we need to do a lookup of a value on the global environment only once, we will be doing a lot of unnecessary caching and post-processing.

Approach 3

This implementation uses only one global hash table for all the bindings (global and local environments too). It's based in associating bindings with numbers, i.e. before our hash table could be treated as a partial function from names to values, now it will be a partial function from names and numbers to values.

A list of this numeric values will represent the current environment.

Each number can be associated to a frame (like the previous approach where we create a hash table for each frame). Now removing a number from the environment is equivalent to remove the bindings declared in such frame. It's a bit tricky to do this in an interpreter. Essentially, it requires implementing a garbage collector to be able to clean the environment and remove those frames that are not longer used.

Datastructures and locality

For reasons that are basically due to physics, computer memory is always arranged in a hierarchy, where there is a small amount of fast memory and larger amounts of increasingly slower memory. Some examples, from Intel Itanium 2 chip:

Primary (L1) cache: 32K, 1 cycle
Secondary (L2) cache: 96K, 5 cycles
Tertiary (L3) cache: 3M, 12 cycles
Main memory: Many M, off-chip, 50+ cycles
Disk: Many G, off motherboard, >10,000,000 cycles

There is a huge performance advantage to algorithms and datastructures that get their data from the fast memory. Note that this performance advantage doesn't show up in big-O analysis, it's huge constant factors.

The key issue in getting this performance advantage is locality. Locality in space is the tendency of a program to revisit nearby locations. Locality in time is the tendency to visit the same location at nearby points in time. Computers are designed so that when a memory location is read, the contents of that location, plus some nearby ones, are brought into the fastest memory.

Why? For one thing, it is (for various hardware reasons) fastest to read memory this way; you always get a big chunk of data from a slow memory. Keeping the nearby ones helps when your program has locality in space. Using some kind of LRU replacement strategy helps with locality in time.

However, there are algorithms that are designed to help with locality as well. The two best examples are B-trees and splay tree, which we will discuss briefly today.

B-trees

A B-tree of order m is a search tree where each non leaf node has up to m children. B-trees were originally invented for storing data structures on disk, where locality is even more crucial than with memory. Accessing a disk location takes about 5ms = 5,000,000ns. Therefore if you are storing a tree on disk you want to make sure that a given disk read is as effective as possible. B-trees, with their high branching factor, ensure that few disk reads are needed to navigate to the place where data is stored. B-trees are also useful for in-memory data structures because these days main memory is almost as slow relative to the processor as disk drives were when B-trees were introduced!

Example of a B-tree:

                   [4  , 10 ,  16]
                   /   |    |    \
                  /    |    |     \
                 /     |    |      \
                /      |    |       \
               /       |    V        \
              V        |  [11,12,25]  \
           [1,2,3]     V              V
                    [6,7,9]         [17,22,36]

The data structure satisfies several invariants:

If a node has n children, it contains n-1 keys.
Every node (except the root) is at least half full
Every path from the root to a leaf has the same length
The root has at least two children if it is not a leaf.

Because the height of the tree is uniformly the same and every node is at least half full, we are guaranteed that the asymptotic performance is O(lg n) where n is the size of the collection. The real win is in the constant factors, of course. We can choose m so that the pointers to the m children plus the m-1 elements fill out a cache line at the highest level of the memory hierarchy where we can expect to get cache hits. For example, if we are accessing a large disk database then our "cache lines" are memory blocks of the size that is read from disk.

Lookup in a B-tree is straightforward. Given a node to start from, we use a simple linear or binary search to find whether the desired element is in the node, or if not, which child pointer to follow from the current node.

Insertion and deletion from a B-tree are more complicated; in fact, they are notoriously difficult to implement correctly. For insertion, we first find the appropriate leaf node into which the inserted element falls (assuming it is not already in the tree). If there is already room in the node, the new element can be inserted simply. Otherwise the current leaf is already full and must be split into two leaves, one of which acquires the new element. The parent is then updated to contain a new key and child pointer. If the parent is already full, the process ripples upwards, eventually possibly reaching the root. If the root is split into two, then a new root is created with just two children.

Splay trees

A splay tree is an efficient implementation of binary search trees that takes advantage of locality in the incoming lookup requests. Locality in this context is a tendency to look for the same element multiple times. A stream of requests exhibits no locality if every element is equally likely to be accessed at each point. For many applications, there is locality, and elements tend to be accessed repeatedly. A good example of an application with this property is a network router. Routers must decide on which outgoing wire to route the incoming packets, based on the IP address in the packets. The router needs a big table (a map) that can be used to look up an IP address and find out which outgoing connection to use. If an IP address has been used once, it is likely to be used again, perhaps many times. Splay trees are designed to provide good performance in this situation.

In addition, splay trees offer amortized O(lg n) performance. That is, a sequence of M operations on an n-node splay tree takes O(M lg n) time.

A splay tree is a binary search tree. It has one interesting difference, however: whenever an element is looked up in the tree, the splay tree reorganizes to move that element to the root of the tree, without breaking the binary search tree invariant. If the next lookup request is for the same element, it can be returned immediately. In general, if a small number of elements are being heavily used, they will tend to be found near the top of the tree and are thus found quickly.

We have already seen a way to move an element upward in a binary search tree: tree rotation. When an element is accessed in a splay tree, tree rotations are used to move it to the top of the tree. This simple algorithm can result in extremely good performance in practice. Notice that the algorithm requires that we be able to update the tree in place, but the abstract view of the set of elements represented by the tree does not change and the rep invariant is maintained. This is an example of a benevolent side effect: a side effect that does not change the abstract view of the value represented.

There are three kinds of tree rotations that are used to move elements upward in the tree. These rotations have two important effects: they move the node being splayed upward in the tree, and they also shorten the path to any nodes along the path to the splayed node. This latter effect means that splaying operations tend to make the tree more balanced.

Rotation 1: Simple rotation

A simple tree rotation is applied at the root of the splay tree, moving the splayed node x up to become the new tree root. Here we have A < x < B < y < C, and the splayed node is either x or y depending on which direction the rotation is.

    y                x
   / \              / \
  x   C    <->     A   y
 / \                  / \
A   B                B   C

Rotation 2: Zig-Zig and Zag-Zag

Lower down in the tree rotations are performed in pairs so that nodes on the path from the splayed node to the root move closer to the root on average. In the "zig-zig" case, the splayed node is the left child of a left child or the right child of a right child ("zag-zag").

      z             x               
     / \           / \
    y   D         A   y
   / \      <->      / \        (A < x < B < y < C < z < D)
  x   C             B   z
 / \                   / \
A   B                 C   D

Rotation 3: Zig-Zag

In the "zig-zag" case, the splayed node is the left child of a right child or vice-versa. The rotations produce a subtree whose height is less than that of the original tree. Thus, this rotation improves the balance of the tree. In each of the two cases shown, y is the splayed node:

 
       z                y                x
      / \              / \              / \
     x   D            /   \            A   z
    / \         ->   x     z    <-        / \
   A   y            / \   / \            y   D
      / \          A   B C   D          / \
     B   C                             B   C