CS6320 – Big Data

Logistics

Course Description

Big Data is a research area motivated by lots of practical problems, resulting in elegant systems abstractions and interesting algorithms. In this course, we will explore the beauty of some foundational and recent work in this area which intersects systems, algorithms, and programming languages.

Workload and Grading

(Draft) Course Schedule

August 23: Introduction to the course

August 28: The relational model, relational algebra, and normalization (Presenter: Johannes)

August 30: Query processing and query optimization (Presenter: Johannes)

As we are starting to discuss more database internals over the next weeks, please read the following paper as background reading (do not be scared by its length; it is easy to read):

September 4: Buffer Management and Selectivity estimation (Presenter: Johannes)

September 6: Concurrency control (Presenter: Johannes)

September 11: Recovery (Presenter: Alan Demers)

September 13: Index Structures I (Presenter: Alan Demers)

September 18: Index Structures II (Presenter: Hema Koppula)

September 20: Main-Memory Database Systems (Presenter: Jiexun Xu)

September 25: Decision Support (Presenter: Albert Liu)

September 27: Decision Support II (Presenter: Kevin James Matzen)

October 2: Index Structures III (Presenter: Eoin O'Mahony)

October 4: Online aggregation (Presenter: Shuo Chen)

October 9: Fall break, no class.

October 11: Approximate query answering I (Presenter: Jon Park)

Note: No summary for October 16 and 18; work on your course projects.

October 16: Approximate query answering II (Presenter: Daniel Cabrini Hauagge)

October 18: Distributed Transaction Management and Replication (Presenter: Yohan Ko)

October 23: Parallel Database Systems (Presenter: Lucja Kot)

October 25: Column Stores (Presenter: Lucja Kot)

October 30: Data stream algorithms I (Presenter: Yexiang Xue)

November 1: Data stream algorithms II (Presenter: Bishan Yang)

November 6: Fagin’s Algorithm (Presenter: Yang Yuan)

November 8: Frequent itemsets, association rules, and sequential patterns (Presenter: Johannes)

November 13: No class.

November 15: Data Stream Systems (Guest Lecture by Walker White)

November 20: Probabilistic Database Systems (Presenter: Vasilis Syrganis)

November 22: Thanksgiving, no class.

November 27: Paxos (Guest Lecture by Robbert Van Renesse)

November 29: Distribution and Fault Tolerance (Presenter: Johannes)