Speaker:  Anastassia Ailamaki
Affiliation:  University of Wisconsin
Date:  3/28/00
Time & Location:  4:15 PM, B17 Upson Hall
Host:  Johannes Gehrke
TITLE: Architecture-Conscious Database Systems

Modern high-performance processors employ sophisticated techniques to overlap and simultaneously execute multiple computation and memory operations. Intuitively, these techniques should help database applications, which are becoming increasingly compute and memory bound. Unfortunately, recent research indicates that, unlike scientific workloads, database systems' performance has not improved commensurately with increases in processor speeds. As the gap between memory and processor speed widens, research on database systems has focused on minimizing memory latencies for isolated algorithms. However, in order to design high-performance database systems it is important to carefully evaluate and understand the interaction between the database software and the underlying hardware.

The first part of this talk introduces a framework for analyzing query execution time on a database system running on a server with a modern processor and memory architecture. Experiments with a variety of benchmarks show that database developers should (a) optimize data placement for the second level of data cache, (b) optimize instruction placement to reduce first-level instruction cache stalls, but (c) not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues (e.g., branch prediction).

The second part of the talk focuses on optimizing data placement for access to the second-level cache. Most commercial DBMSs store records contiguously on disk pages, using the slotted-page approach (NSM). During single attribute scan, NSM exhibits poor spatial locality and has a negative impact on cache performance. The decomposition storage model (DSM) has better spatial locality, but incurs a high record reconstruction cost. We introduce Partition Attributes Across (PAX), a new layout for data records that is applied orthogonally to NSM pages and offers optimized cache utilization with no extra space or time penalty.