CS 4320 (and 5320): Introduction to Database Systems

CS4320/CS5320 is an introduction to relational database systems, NoSQL and NewSQL systems, and other tools for large-scale data analysis. Topics covered include the relational model, SQL, query processing and optimization, transactions, recovery, NoSQL and NewSQL systems, database design, Map Reduce, and Spark. The accompanying practicum course is CS 4321/5321.

All assignments will be distributed over the Course Management System (CMS). We will be using Piazza as a forum for student discussion, for announcements, and to distribute material such as lecture notes. Therefore, this website is deliberately fairly minimalistic.

Staff

Here are the contact details for the course staff:

Lectures

The course meets three times a week, MWF 10:20am - 11:10am (Eastern Time). Lectures are given online via Zoom, video recordings will be made available via multiple video sharing platforms. Zoom links are posted on Piazza before lecture start, links to recordings will be posted (below) after the lecture. 

The lectures are interactive in the sense that students can ask questions or suggest answers to test questions during the lecture. All interactions must preserve anonymity (of students) as lecture recordings will be publicly available. Hence, all questions and answers must be posted in the Zoom chat (which is not visible in the recording). Chat questions will be answered between lecture modules.

Homeworks

Different from prior instances of this course, no intermediate or final exams are scheduled. The grade is based on seven homeworks, each homework represents an equal share of your final score. Only your best six submissions will be counted towards your final score (i.e., your worst submission will be dropped). Late submissions are generally not accepted.

Homeworks focus on recent course material, some of them may require small implementation efforts in SQL or Java. Homeworks are to be prepared and submitted by each student separately (i.e., no group work). We will use different means (including tools for detecting code similarity) to check for overlap among homework submissions. In case of suspicions of an academic integrity violation, we will schedule a formal hearing which may have serious consequences. Please do not cheat, it's not worth it.

In addition to the homework submission, short (five minutes) Zoom interviews with the instructor are scheduled for the best few (at least five) submissions for each homework. Questions focus on the homework submission and on exercises that are very similar to the homework. Typically, the score will not change by the interview. In case of a large gap between interview and submission score, the average of both will be used as final score for the homework.

Grading

Grading is based on the average score of your best six homeworks. The cutoffs between grades vary from year to year. As a rough guideline, the cutoff for an A grade was typically around 90% and the cutoff for a B grade at around 80% over the past years. Those cutoffs serve merely as guidelines and may not be followed this semester, depending on the overall grade distribution for the new grading system. C and D grades are historically rare, a grade of D or worse typically means that a student has failed to submit multiple assignments.

Syllabus

The following syllabus is tentative and may be adapted over the semester.

Lecture Topic Recording Slides
1 Introduction, Logistics Link Link
2 SQL Basics Link Link
3 Advanced SQL I Link Link
4 Advanced SQL II Link Link
5 Data Storage Fundamentals Link Link
6 Tree Indices Link Link
7 Tree Indices II Link (same)
8 Hash Indices Link Link
9 Data Processing Fundamentals Link Link
10 Relational Operators I Link Link
11 Relational Operators II Link Link
12 Relational Operators III Link Link
13 Query Optimization Link Link
14 Query Optimization/Transactions Link Link
15 Intro to Transactions/Concurrency Control Link Link
16 Concurrency Control Fundamentals Link (same)
17 Two-Phase Locking Link Link
18 Two-Phase Locking II Link Link
19 Non-Locking Concurrency Control Link Link
20 Recovery Link Link
21 Recovery II Link Link
22 Recovery III Link Link
23 Summary of Material Link Link
24 Database Design Link Link
25 Normal Forms Link Link
26 Normalization I Link (same)
27 Normalization II Link (same)
28 Normalization III, DB Design Example Link (same)
29 Distributed DBMS II Link Link
30 Eventual Consistency Link Link
31 NewSQL Link Link
32 Graph Databases Link Link
33 Distributed Graph Processing Link Link
34 Data Streams Link Link
35 Distributed Stream Processing Link Link
36 Spatial Data I Link Link
37 Spatial Data II Link Link
38 Approximate Query Processing Link
39 Conclusion Link

We also made the videos available at www.databaselecture.com, grouped by course chapter.

Time permitting, several Cloud-based systems for relational data analysis will be discussed towards the end of the course.

Office Hours

Office hours will be posted on Piazza - changes to regular office hours will be posted as Piazza announcements. All office hours are virtual.

Additional Material

About the first two thirds of the course are based on the textbook "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke. The last third of the course discusses recently proposed systems, more details about them can be found in the corresponding research papers.