CS6320

Logistics

Course Description

In this course, we review recent trends and foundational work in the area of databases and large-scale data analysis. Starting from the foundations of relational databases, we review recent research in areas such as column stores, main-memory databases, query compilation, and approximate query engines that aims at making data processing more efficient. We cover parallel and distributed databases, NoSQL and NewSQL systems, stream processing engines, graph databases, and systems for data mining and large-scale machine learning. Finally, we review approaches to make databases more user-friendly, including natural language interfaces and automated data visualization.

An important component of this course is the course project which requires you to research a database-related problem of your choice.

Workload and Grading

Course Schedule (Draft)

Introduction to the course

Slides.

Basics, Architecture of a Database Management System

Section 1: Foundations

Indexing

Slides.

Joins

Query Optimization

Concurrency Control

Logging and Recovery

Buffer Management

Data Archival

Section 2: Efficient Query Processing

Column Stores

Main Memory Databases

Query Compilation

Online/Approximate Processing

Processing on Novel Hardware

(Massively) Parallel Processing

Optional: D. J. DeWitt, J. Gray: Parallel database systems: the future of high-performance database systems. CACM 1992.

Section 3: Efficient Transaction Processing

CAP Theorem vs. NoSQL Databases

NewSQL

Coordination Avoidance

Optional: S. Roy et al.: The homeostatis protocol: avoiding transaction coordination through program analysis. SIGMOD 2015.

Section 4: Beyond Relational Data Processing

Optional: Video of M. Stonebraker on "One size fits all: an idea whose time has come and gone".

Graph Databases

Stream Processing

Visual Data and Videos

Machine Learning

Knowledge Mining

Section 5: User Interfaces

Optional: Video of VLDB 2015 Panel on "Design for Interaction"

Gestural Interfaces and Augmented Reality

Natural Language and Voice Interfaces

Data Visualization