Tuesday, April 25, 2006
4:15 pm
B17 Upson Hall

Computer Science
Spring 2006

Alan Halverson
University of Wisconsin, Madison

Storage and Query Processing Optimizations For Hierarchical Data

All major relational database management systems use some variant of a "row store" to store their data. This design offers simplicity of implementation while providing excellent performance for write- intensive workloads. In this talk I describe two storage optimizations for a row store architecture given a read-mostly query workload - "super tuples" and "column abstraction." Recently, a column store system named C-Store has also shown significant performance benefits for read-mostly query workloads. I implemented both our optimized row store and C-Store in a common framework in order to perform an "apples-to-apples" comparison of the optimizations in isolation and combination. Although the C-Store system offers tremendous performance benefits for scanning a small fraction of columns from a table, my optimized row store provides disk storage savings, reduced sequential scan times, and low additional CPU overheads while requiring only evolutionary changes to a standard row store.