CS4320/CS5320 is an introduction to relational database systems, NoSQL and NewSQL systems, and other tools for large-scale data analysis. Topics covered include the relational model, SQL, query processing and optimization, transactions, recovery, NoSQL and NewSQL systems, database design, Map Reduce, and Spark. The accompanying practicum course is CS 4321/5321.
All assignments will be distributed over the Course Management System (CMS). We will be using Piazza as a forum for student discussion, for announcements, and to distribute material such as lecture notes. Therefore, this website is deliberately fairly minimalistic.
Here are the contact details for the course staff:
The course meets three times a week, MWF 10:20am - 11:10am (Eastern Time). Lectures are given online via Zoom, video recordings will be made available via multiple video sharing platforms. Zoom links are posted on Piazza before lecture start, links to recordings will be posted (below) after the lecture.
The lectures are interactive in the sense that students can ask questions or suggest answers to test questions during the lecture. All interactions must preserve anonymity (of students) as lecture recordings will be publicly available. Hence, all questions and answers must be posted in the Zoom chat (which is not visible in the recording). Chat questions will be answered between lecture modules.
Different from prior instances of this course, no intermediate or final exams are scheduled. The grade is based on seven homeworks, each homework represents an equal share of your final score. Only your best six submissions will be counted towards your final score (i.e., your worst submission will be dropped). Late submissions are generally not accepted.
Homeworks focus on recent course material, some of them may require small implementation efforts in SQL or Java. Homeworks are to be prepared and submitted by each student separately (i.e., no group work). We will use different means (including tools for detecting code similarity) to check for overlap among homework submissions. In case of suspicions of an academic integrity violation, we will schedule a formal hearing which may have serious consequences. Please do not cheat, it's not worth it.
In addition to the homework submission, short (five minutes) Zoom interviews with the instructor are scheduled for the best few (at least five) submissions for each homework. Questions focus on the homework submission and on exercises that are very similar to the homework. Typically, the score will not change by the interview. In case of a large gap between interview and submission score, the average of both will be used as final score for the homework.
Grading is based on the average score of your best six homeworks. The cutoffs between grades vary from year to year. As a rough guideline, the cutoff for an A grade was typically around 90% and the cutoff for a B grade at around 80% over the past years. Those cutoffs serve merely as guidelines and may not be followed this semester, depending on the overall grade distribution for the new grading system. C and D grades are historically rare, a grade of D or worse typically means that a student has failed to submit multiple assignments.
The following syllabus is tentative and may be adapted over the semester.
|3||Advanced SQL I||Link||Link|
|4||Advanced SQL II||Link||Link|
|5||Data Storage Fundamentals||Link||Link|
|7||Tree Indices II||Link||(same)||8||Hash Indices||Link||Link|
|9||Data Processing Fundamentals||Link||Link|
|10||Relational Operators I||Link||Link|
|11||Relational Operators II||Link||Link|
|12||Relational Operators III||Link||Link|
|15||Intro to Transactions/Concurrency Control||Link||Link|
|16||Concurrency Control Fundamentals||Link||(same)|
|18||Two-Phase Locking II||Link||Link|
|19||Non-Locking Concurrency Control||Link||Link|
|23||Summary of Material||Link||Link|
|28||Normalization III, DB Design Example||Link||(same)|
|29||Distributed DBMS II||Link||Link|
|33||Distributed Graph Processing||Link||Link|
|35||Distributed Stream Processing||Link||Link|
|36||Spatial Data I||Link||Link|
|37||Spatial Data II||Link||Link|
|38||Approximate Query Processing||Link|
We also made the videos available at www.databaselecture.com, grouped by course chapter.
Time permitting, several Cloud-based systems for relational data analysis will be discussed towards the end of the course.
Office hours will be posted on Piazza - changes to regular office hours will be posted as Piazza announcements. All office hours are virtual.
About the first two thirds of the course are based on the textbook "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke. The last third of the course discusses recently proposed systems, more details about them can be found in the corresponding research papers.