Cache-conscious Database Systems

Until recently, database systems kept most of their data on disk, and the goal of database optimization was to minimize the number of disk accesses. This is beginning to change because of the availability of commodity parallel processors with enormous amounts of main memory. The prediction is that within the next decade, it will be common to have a terabyte of main memory serving as a buffer pool for a hundred-terabyte database. In such databases, most of the working set of a database query might be in main memory, and the performance of the database query engine might depend critically on the cache behavior of the database. Some evidence for this belief comes from two recent papers that we have linked below. The first paper is by Ailamaki et al from the Wisconsin database group, and it is a study of where time is spent when a database query is executed. They report studies on four memory-resident commercial DBMS systems. They conclude that database implementors must (a) optimize data placement for the L2 cache, and (b) optimize instruction placement to reduce L1 instruction cache. The goal of this project is to study the impact of caches on database design. Here are some questions to think about. There are any number of research problems here. Johannes Gehrke has offered to help with this project. You can use the Predator database system for carrying out an implementation of ideas you come up with. Here are some references to get you started:
  1. Ailamaki et al: DBMSs on a modern processor: where does time go?. VLDB 99.
  2. Jun Rao, Kenneth Ross: Cache conscious indexing for decision-support in main memory. VLDB 99: 78-89.
  3. Ambuj Shatdal, Chander Kant, Jeffrey F. Naughton: Cache Conscious Algorithms for Relational Query Processing. VLDB 1994: 510-521
  4. H. V. Jagadish, Daniel F. Lieuwen, Rajeev Rastogi, Abraham Silberschatz, S. Sudarshan: Dalí: A High Performance Main Memory Storage Manager. VLDB 1994: 48-59
  5. Kimberly Keeton, David A. Patterson, Yong Qiang He, Roger C. Raphael, Walter E. Baker: Performance Characterization of a Quad Pentium Pro SMP using OLTP Workloads. ISCA 1998: 15-26
  6. Jerry Baulier, Philip Bohannon, S. Gogate, C. Gupta, S. Haldar, S. Joshi, A. Khivesera, Henry F. Korth, P. McIlroy, J. Miller, P. P. S. Narayan, M. Nemeth, Rajeev Rastogi, S. Seshadri, Abraham Silberschatz, S. Sudarshan, M. Wilder, C. Wei: DataBlitz Storage Manager: Main Memory Database Performance for Critical Applications. SIGMOD Conference 1999: 519-520
  7. Tobin J. Lehman, Michael J. Carey: A Study of Index Structures for Main Memory Database Management Systems. VLDB 1986: 294-303
  8. Tobin J. Lehman, Michael J. Carey: Query Processing in Main Memory Database Management Systems. SIGMOD Conference 1986: 239-250
  9. Anthony LaMarca, Richard E. Ladner: The Influence of Caches on the Performance of Sorting. SODA 1997: 370-379