CS6320

Logistics

Course Description

In this course, we review recent trends and foundational work in the area of databases and large-scale data analysis. Starting from the foundations of relational databases, we review recent research in areas such as column stores, main-memory databases, query compilation, and approximate query engines that aims at making data processing more efficient. We cover parallel and distributed databases, NoSQL and NewSQL systems, stream processing engines, graph databases, and systems for data mining and large-scale machine learning. Finally, we review approaches to make databases more user-friendly, including natural language interfaces and automated data visualization.

An important component of this course is the course project which requires you to research a database-related problem of your choice.

Workload and Grading

Course Schedule (Draft)

August 23: Introduction to the course

August 25: Basics, Architecture of a Database Management System

Section 1: Foundations

August 30: Joins

September 1: Indexing

September 6: (VLDB)

September 8: (VLDB)

September 13: Query Optimization

September 15: Selectivity Estimation & Robust Optimization

September 16: Concurrency Control (Location: Gates 405!)

September 20: Logging and Recovery

September 22: Buffer Management

Section 2: Efficient Query Processing

September 27: Column Stores

September 29: Main Memory Databases

October 4: Query Compilation Nancy

October 6: Online/Approximate Processing

October 11: (Fall Break)

October 13: Processing on Novel Hardware

October 18: (Massively) Parallel Processing

Optional: D. J. DeWitt, J. Gray: Parallel database systems: the future of high-performance database systems. CACM 1992.

October 20: Data Warehousing vs. MAD Analytics Jenny

Section 3: Efficient Transaction Processing

October 25: CAP Theorem vs. NoSQL Databases

October 27: NewSQL Kai

November 1: Coordination Avoidance

Optional: S. Roy et al.: The homeostatis protocol: avoiding transaction coordination through program analysis. SIGMOD 2015.

Section 4: Beyond Relational Data Processing

Optional: Video of M. Stonebraker on "One size fits all: an idea whose time has come and gone".

November 3: Graph Databases Nancy

November 8: Stream Processing Kai

November 10: Machine Learning Sanjana

November 15: Knowledge Mining

Section 5: User Interfaces

November 17: Novel Query Interfaces Sanjana

Optional: Video of VLDB 2015 Panel on "Design for Interaction"

November 22: Data Visualization

November 24: (Thanksgiving)

November 29: Privacy?

December 1: Crowd Database Systems