Thursday, February 22, 2007
4:15 pm
B17 Upson Hall

Computer Science
Spring 2007

Daniel Abadi
Massachusetts Institute of Technology

Query Execution in Column-Oriented Database Systems

Recent research on column-oriented database systems (DBMSs) has shown that these systems can outperform existing row oriented DBMSs by one to two orders of magnitude on read mostly query workloads like those found in data warehouses, decision support, and customer relationship management systems. In this talk, I will discuss this exciting new class of database systems and will provide an overview of the C-Store system that we have developed over the past two years at MIT.  I will then focus on the design of the column-oriented query execution engine I have developed. In particular, I will discuss the impact on query performance of tuple construction (stitching together attributes from multiple columns into a row-oriented "tuple") and operation on compressed data. Tuple construction allows column oriented DBMSs to offer a standards-compliant relational database interface (e.g., ODBC, JDBC, etc); however, if done at the wrong point in a query plan, a significant performance penalty is paid. Similarly, data compression can improve query performance by an order of magnitude by trading cheap CPU cycles for expensive I/O bandwidth.