- Instructor: Immanuel Trummer,
411B Gates Hall; office hours: Wednesdays, 3-4pm or by appointment.
- Class: Tuesdays 1:25-2:40pm and Thursdays 1:25-2:40pm; Bard Hall 140.
In this course, we review recent trends and foundational work in the
area of databases and large-scale data analysis. Starting from the
foundations of relational databases, we review recent research in areas
such as column stores, main-memory databases, query compilation, and
approximate query engines that aims at making data processing more
efficient. We cover parallel and distributed databases, NoSQL and
NewSQL systems, stream processing engines, graph databases, and systems
for data mining and large-scale machine learning. Finally, we review
approaches to make databases more user-friendly, including natural
language interfaces and automated data visualization.
component of this course is the course project which requires you to
research a database-related problem of your choice.
Workload and Grading
- Several class
presentations about a topic with associated research papers (25%)
- Participation in class discussions (25%)
- Write a (hopefully publishable) research paper in the area of
database systems (50%). You can do a project by yourself, or with
another student from the class.
- Topic selection. Please talk to me about the topic of your
project to make sure that the project is within the scope of the class.
Several high-level ideas for project topics will be presented in the
first course session. You should have selected a project topic by February 7.
- Project proposal with references. The proposal should contain
your goals for the project and the results of a thorough literature
search. The project proposal is due February 14.
- An intermediate status update the week of March 15 An
email to Immanuel is sufficient.
- The final project report. The project report should be
formatted like a regular paper for a conference submission (use the ACM
style). The final project is due May 2.
Introduction to the courseSlides.
Basics, Architecture of a Database Management System
Section 1: Foundations
Selectivity Estimation & Robust Optimization
Logging and Recovery
Section 2: Efficient Query Processing
Main Memory Databases
Processing on Novel Hardware
(Massively) Parallel Processing
Optional: D. J. DeWitt, J. Gray: Parallel database systems: the future of high-performance database systems. CACM 1992.
Data Warehousing vs. MAD Analytics
Section 3: Efficient Transaction Processing
CAP Theorem vs. NoSQL Databases
Optional: S. Roy et al.: The homeostatis protocol: avoiding transaction coordination through program analysis. SIGMOD 2015.
Section 4: Beyond Relational Data Processing
Optional: Video of M. Stonebraker on "One size fits all: an idea whose time has come and gone".
Section 5: User Interfaces
Novel Query Interfaces
Optional: Video of VLDB 2015 Panel on "Design for Interaction"
Crowd Database Systems