CS6320
Logistics
- Instructor: Immanuel
Trummer, 411B Gates Hall; office hours: Wednesdays, 3-4pm.
- Class: Tuesdays 1:25-2:40pm and Thursdays 1:25-2:40pm; Upson
Hall 216.
Course Description
In this course, we review recent trends and foundational work in
the area of databases and large-scale data analysis. Starting from
the foundations of relational databases, we review recent research
in areas such as column stores, main-memory databases, query
compilation, and approximate query engines that aims at making
data processing more efficient. We cover parallel and distributed
databases, NoSQL and NewSQL systems, stream processing engines,
graph databases, and systems for data mining and large-scale
machine learning. Finally, we review approaches to make databases
more user-friendly, including natural language interfaces and
automated data visualization.
An important component of this course is the course project which
requires you to research a database-related problem of your
choice.
Workload and Grading
- Several in-class presentations and participation in
discussions (50% of grade).
- Each presentation is given by two students.
- Encourage participation and discussion!
- Course project (50% of grade).
- Teams of up to three students per project.
- Select topic and write one page summary within two
weeks.
- Intermediate progress report (two pages) until March 15.
- Final report (six pages) until May 7.
- (page numbers assume reasonable margins and font size etc.,
e.g. standard LaTeX article)
Course Schedule (Draft)
Introduction to the course
Slides.
Basics, Architecture of a Database Management System
Section 1: Foundations
Indexing
Slides.
Joins
Query Optimization
Concurrency Control
Logging and Recovery
Buffer Management
Data Archival
Section 2: Efficient Query Processing
Column Stores
Main Memory Databases
Query Compilation
Online/Approximate Processing
Processing on Novel Hardware
(Massively) Parallel Processing
Optional:
D. J. DeWitt, J. Gray: Parallel
database systems: the future of high-performance database systems.
CACM 1992.
Section 3: Efficient Transaction Processing
CAP Theorem vs. NoSQL Databases
NewSQL
Coordination Avoidance
Optional:
S. Roy et al.: The
homeostatis protocol: avoiding transaction coordination through
program analysis. SIGMOD 2015.
Section 4: Beyond Relational Data Processing
Optional:
Video
of M. Stonebraker on "One size fits all: an idea whose time has
come and gone".
Graph Databases
Stream Processing
Visual Data and Videos
Machine Learning
Knowledge Mining
Section 5: User Interfaces
Optional:
Video of VLDB
2015 Panel on "Design for Interaction"
Gestural Interfaces and Augmented Reality
Natural Language and Voice Interfaces
Data Visualization