CS6320
Logistics
- Instructor: Immanuel Trummer,
411B Gates Hall; office hours: Wednesdays, 3-4pm or by appointment.
- Class: Tuesdays 1:25-2:40pm and Thursdays 1:25-2:40pm; Hollister
Hall 206.
- Exceptional courses on Friday from 9:50-11:05am in Gates Hall 405.
Course Description
In this course, we review recent trends and foundational work in the
area of databases and large-scale data analysis. Starting from the
foundations of relational databases, we review recent research in areas
such as column stores, main-memory databases, query compilation, and
approximate query engines that aims at making data processing more
efficient. We cover parallel and distributed databases, NoSQL and
NewSQL systems, stream processing engines, graph databases, and systems
for data mining and large-scale machine learning. Finally, we review
approaches to make databases more user-friendly, including natural
language interfaces and automated data visualization.
An important
component of this course is the course project which requires you to
research a database-related problem of your choice.
Workload and Grading
- Every week one paper review (25%). Writing a review is a
great way to familiarize yourself with a new paper and its context.
Also, regularly writing reviews helps you to become a better researcher
and paper writer. A good review consists of the following parts:
- Short synopsis of paper contributions
- Summary describing problem, approach, and main results
- At least three weak points and at least three strong points
- Consider presentation, novelty, relevance, underlying
assumptions, ...
- Detailed comments
- Justify strong/weak points in more detail; point to
possible improvements
Reviews
are due on Monday (until midnight); the paper to review can be freely
selected out of the papers discussed in the corresponding week unless
announced otherwise in the lecture.
- Several class
presentations about a topic with associated research papers;
participation in discussions etc. (25%)
- Write a (hopefully publishable) research paper in the area of
database systems (50%). You can do a project by yourself, or with
another student from the class.
- Topic selection. Please talk to me about the topic of your
project to make sure that the project is within the scope of the class.
Several high-level ideas for project topics will be presented in the
first course session. You should have selected a project topic by September 13.
- Project proposal with references. The proposal should contain
your goals for the project and the results of a thorough literature
search. The project proposal is due October 3.
- An intermediate status update the week of November 1. An
email to Immanuel is sufficient.
- The final project report. The project report should be
formatted like a regular paper for a conference submission (use the ACM
style). The final project is due December 15.
Course
Schedule (Draft)
August 23:
Introduction to the course
August 25: Basics, Architecture of a Database Management System
Section 1: Foundations
August 30: Joins
September 1: Indexing
September 6: (VLDB)
September 8: (VLDB)
September 13: Query Optimization
September 15: Selectivity Estimation & Robust Optimization
September 16: Concurrency Control (Location: Gates 405!)
September 20: Logging and Recovery
September 22: Buffer Management
Section 2: Efficient Query Processing
September 27: Column Stores
September 29: Main Memory Databases
October 4: Query Compilation Nancy
October 6: Online/Approximate Processing
October 11: (Fall Break)
October 13: Processing on Novel Hardware
October 18: (Massively) Parallel Processing
Optional: D. J. DeWitt, J. Gray: Parallel database systems: the future of high-performance database systems. CACM 1992.
October 20: Data Warehousing vs. MAD Analytics Jenny
Section 3: Efficient Transaction Processing
October 25: CAP Theorem vs. NoSQL Databases
October 27: NewSQL Kai
November 1: Coordination Avoidance
Optional: S. Roy et al.: The homeostatis protocol: avoiding transaction coordination through program analysis. SIGMOD 2015.
Section 4: Beyond Relational Data Processing
Optional: Video of M. Stonebraker on "One size fits all: an idea whose time has come and gone".
November 3: Graph Databases Nancy
November 8: Stream Processing Kai
November 10: Machine Learning Sanjana
November 15: Knowledge Mining
Section 5: User Interfaces
November 17: Novel Query Interfaces Sanjana
Optional: Video of VLDB 2015 Panel on "Design for Interaction"
November 22: Data Visualization
November 24: (Thanksgiving)
November 29: Privacy?
December 1: Crowd Database Systems