CS4321/CS5321 is a practicum course to accompany CS 4320. You will work together in groups to develop a simple database management system in Java. This system will feature a simple SQL parser, the capability to index data for fast access, multiple implementations of relational operators such as joins, and a simple query optimizer.
The project is divided into multiple phases in which specific system components are developed. No reference code is provided during the project - you develop your system from scratch and extend it in each project phase.
The main platform for distributing course material and for submitting your results is the Course Management System (CMS). In addition, we will be using Piazza as a forum for student discussion and for announcements.
Here are the contact details for the course staff:
The project is divided into five phases. An overview of those five phases follows:
|1||Build simple Java interpreter for SQL queries||9/22|
|2||Increasing efficiency by processing data batches (pages) and smarter operator implementations (block nested loops join, sort-merge join)||10/16|
|3||Speeding up data access by implementing indices (B+ tree indices)||10/30|
|4||Adding a query optimizer that chooses the best processing plan based on data statistics by comparing candidate plans||11/20|
|5||Extensions, multiple possibilities: either parallelize your implementation or implement a query optimizer based on machine learning. Brave souls with a taste for theory may attempt the implementation of a worst-case optimal join algorithm instead.||12/16|
Please ask questions first on Piazza as posting answers here will benefit other students as well. Chances are good that others have the same question as you do. If your question cannot be resolved via Piazza, consider attending an office hour.
TAs and the instructor will be available for virtual office hours during specific time slots. Time slots and associated Zoom links will be posted on Piazza. Debugging code can be time-consuming and TAs need to partition their time fairly among all interested students. Hence, we typically limit time per group to ten minutes during an office hour.
Your grade for project phases one to four is mostly based on automated tests. To get full points, your code needs to pass all tests. A certain fraction of points are style points and given if your code is reasonably readable and commented. For some project phases, a benchmark report needs to be submitted that accounts for a fraction of points as well. The precise breakdown of points is specified in the detailed instructions for each project phase.
Your grade for phase five is based on correctness (i.e., your implementation must still generate correct query results) and on whether the goal you set for yourself (one of the three: parallelizing all operators, implementing ML optimizer, or implementing worst-case optimal join algorithm) has been achieved. A small fraction of points is given for a short (two pages max) report analyzing the performance difference of the phase four and phase five implementations.
As a rough guideline: having a score of 90% of the maximum typically implied an A grade in past years, a score of 80% implied a B grade. Lower grades are very rare and typically only given to students who missed multiple assignments. Note that the precise cutoff may vary from year to year, depending on the overall grade distribution. The percentages above are therefore tentative.
Each team has two slip days over the entire semester. This means you may make one submission up to two days late without penalty. Alternatively, you may delay two submissions by one day.
It occasionally happens that one small mistake causes an implementation to fail most of the automated tests, thereby losing most points. In those cases, it is possible to submit a short fix (i.e., few lines of code, you cannot rewrite the implementation) via the regrade submission functionality of CMS. This must be done within one week after the grade has been released.
You are restricted to working with members of your own group. You may not copy code from other teams or external sources, except for standard libraries and code provided as part of an assignment from this course. If you are unsure about whether you are allowed to use specific code, please verify with the course staff. We take academic integrity seriously and will investigate if we suspect an academic integrity violation.
The course meets at (or before) the beginning of each project phase on Fridays, 11:30am - 12:20pm (possible shorter in case of few questions). A corresponding Zoom link will be posted on Piazza before the meeting, a video recording of the meeting will be made available. Please ask all questions in the Zoom chat during the meetings (to ensure your anonymity when releasing the video recording).
The dates for the meetings are the following:
|9/4||Introduction, presentation of project phase 1||Slides 1||Link|
|9/18||Presentation of project phase 2||Slides 2||Link|
|10/16||Presentation of project phase 3||Slides 3||Link|
|10/30||Presentation of project phase 4||Slides 4||Link|
|11/13||Presentation of project phase 5||Slides 5||Link|