Cache-conscious Recursive Data Structure

Many applications use recursive pointer-based data structures like linked lists, trees, hash tables etc. For example, N-body simulations use geometric data structures like quad-trees and oct-trees, while database applications use B-trees and hash tables.

Because of their irregular structure, such data structures present many challenges to the compiler and underlying hardware. For instance, most programmers do not pay very close attention to how heap allocated objects are laid out in memory. This can result in lost performance, because objects that are temporally adjacent (one is accessed soon after another), may not be spatially adjacent (the two are not in the same part of memory). As Richard Sites put it in, "Random Memory Access Abuse",

Without extreme care in memory access patterns, pointer structures are death. This is in stark contrast to what one might "learn" about computer design by studying the SPEC benchmarks. Controlling memory access patterns will drive hardware and software designs for the foreseeable future.

Languages like Java that provide garbage collection remove most of the memory management burden from the programmer, but this means that the system must now assume the responsibility for allocating objects in a memory "friendly" way. This requires compiler help.

The goal of this project is to come up with analyses and transformations that result in "cache-conscious" recursive data structure code: that is, code that will run well on a machine with a memory hierarchy. Here are some papers to get you started.

  1. There are a bunch of papers on this topic that we have collected here .
  2. Read the two PLDI 99 papers by Chilimbi and Larus to get a feel for a practical approach to this problem.
  3. Read the survey paper by Vitter to get a feel for some theoretical models of locality enhancement.
  4. Invent transformations for enhancing locality in programs that deal with recursive data structures, and implement them. What kinds of analysis techniques are required? See the collected papers for some previously developed analysis techniques.
  5. Here's a different approach to specifying information at the source level for restructuring these kinds of applications.
  6. Your contact person for this project is Keshav Pingali.