4/13: The final
presentations are on 5/11/2001 in the systems lab, starting at 2pm. Prepare a 15 minute
presentation about your project. Your final
project report is due at that time as well. The final exam (48 hours) will be
handed out after the presentations. 
2/15: Remember that we moved the start
time of the class to 2:50pm due to the colloquia directly after class.
2/5: The reading list has been nearly
completed. 
2/5: One project has been taken (see the list
of projects). If you have any questions about the projects, feel welcome to
stop by this afternoon to talk with me.
1/30: The readings can only be accessed
from within the CUCS firewall due to copyright issues.
1/28: A project timeline has been added at
the bottom of this page.
1/28: The list of project suggestions is available online.
If you decide on a project, send me email, and I will send you a list of related
work. You can do a project by yourself, or you can work in a group of students.
I will stay after class on Tuesday, so that we can talk about projects if you
have any questions.
1/23: You can sign up for your presentation date by sending
email to johannes@cs.cornell.edu 
Class: TR 2:50-4:05, Thurston 203
Instructor
Johannes Gehrke
4108 Upson Hall; johannes@cs.cornell.edu
Office hours: Wednesdays 2:30-3:30pm, or by appointment.
Teaching Assistant
Alexandre
Evfimievski
5136 Upson Hall;  aevf@cs.cornell.edu; 
Office hours: Mondays 2:30-3:30pm, or by appointment
CS632 covers on-going trends in database systems, focusing on database fundamentals, but also illustrating new research directions and current problems. The course pre-requisite is CS 432 or equivalent introductory background material in database systems. We will study a collection of papers, some of these are "classics" from the early days of relational databases, others are more recent papers that have influenced the field. At the end of the course, you will have an in-depth understanding of the state of the art in database systems.
| Date | Topic and Paper | Presenter | Notes | 
| 1/23 | Introduction to the course; review of some CS432
      material Slides on relational algebra: PDF, PDF (6 slides per page) Slides on normalization: PDF, PDF (6 slides per page) | Johannes | |
| 1/25 | Buffer management Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. VLDB 1985: 127-141 Join Processing Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. TODS 11(3): 239-264(1986) | Johannes | |
| 1/30 | Query optimization Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 Yannis E. Ioannidis, Younkyung Cha Kang: Randomized Algorithms for Optimizing Large Join Queries. SIGMOD Conference 1990: 312-321 | Johannes | |
| 2/1 | Selectivity estimation "Improved Histograms for Selectivity Estimation of Range Predicates", by V. Poosala, Y.E. Ioannidis, P. J. Haas, and E. Shekita, Proc. of the 1996 ACM SIGMOD Conference, Montreal, Canada, June 1996. "Balancing Histogram Optimality and Practicality for Query Result Size Estimation", by Y.E. Ioannidis and V. Poosala, Proc. of the 1995 ACM SIGMOD Conference, San Jose, CA, May 1995, pages 233-244. | Johannes | |
| 2/6 | Concurrency control Rakesh Agrawal, Michael J. Carey, Miron Livny: Models for Studying Concurrency Control Performance: Alternatives and Implications. SIGMOD Conference 1985: 108-121 H. T. Kung, John T. Robinson: On Optimistic Methods for Concurrency Control. TODS 6(2): 213-226(1981) Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil: A Critique of ANSI SQL Isolation Levels. SIGMOD Conference 1995: 1-10 | Nate Nystrom | |
| 2/8 | Recovery Theo Härder, Andreas Reuter: Principles of Transaction-Oriented Database Recovery. Computing Surveys 15(4): 287-317(1983) C. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter Schwarz: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. TODS 17(1): 94-162(1992) | Cristian Bucila | |
| 2/13 | Distributed database systems Lothar F. Mackert, Guy
      M. Lohman: R*
      Optimizer Validation and Performance Evaluation for Local Queries. SIGMOD
      Conference 1986: 84-95  | Michael Morse | |
| 2/15 | Distributed query processing C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. TODS 11(4): 378-396 (1986) The following two papers are on the Garlic
      Project Home Page. Laura M. Haas, Donald
      Kossmann, Edward
      L. Wimmers, Jun
      Yang: Optimizing Queries Across Diverse Data Sources. VLDB
      1997: 276-285 | Ashish Motivala | |
| 2/20 | Replication (and its dangers) Jim Gray, Pat Helland, Patrick E. O'Neil, Dennis Shasha: The Dangers of Replication and a Solution. SIGMOD Conf. 1996: 173-182 | Guest lecture by Professor Al Demers. | |
| 2/22 | Thor Providing Persistent Objects in Distributed Systems. Proceedings of the 13th European Conference on Object-Oriented Programming (ECOOP '99), Lisbon, Portugal, June 1999. [PDF] Barbara Liskov, Miguel Castro, Liuba Shrira, Atul Adya HAC: Hybrid Adaptive Caching for Distributed Storage Systems. Proceedings of the ACM Symposium on Operating System Principles (SOSP '97), Saint Malo, France, October 1997. [PDF] Miguel Castro, Atul Adya, Barbara Liskov, Andrew C. Myers | Guest lecture by Andrew Myers | |
| 2/27 | Main-memory database systems: Dali The Architecture of the Dali Storage Manager. A recent summary of Dali's architecture, features, and research contributions. This is the best overview of the system as it is today. Published in the Journal of Multimedia Tools and Applications, 4/2, 1997. Tobin J. Lehman, Michael J. Carey: A Study of Index Structures for Main Memory Database Management Systems. VLDB 1986: 294-303 | Adina Costea | |
| 3/1 | Main-memory database systems ``Cache Conscious Indexing for Decision-Support in Main Memory'' J. Rao and K. A. Ross, Proceedings of the 1999 VLDB Conference, September 1999. postscript ``Making B+-Trees Cache Conscious in Main Memory'' J. Rao and K. A. Ross, Proceedings of the 2000 SIGMOD Conference, May 2000. postscript Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture Optimized for the New Bottleneck: Memory Access. VLDB 1999: 54-65 | Guest lecture by Zhiyuan Chen | |
| 3/6 | Decision support: Bitmap indices Patrick
      E. O'Neil, Dallan
      Quass: Improved
      Query Performance with Variant Indexes. SIGMOD
      Conference 1997: 38-49 | Rohit Ananthakrishna Rohit's disk crashed and the slides are lost. | |
| 3/8 | Decision support: Computing the cube Sameet
      Agarwal, Rakesh
      Agrawal, Prasad
      Deshpande, Ashish
      Gupta, Jeffrey
      F. Naughton, Raghu
      Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional
      Aggregates. VLDB
      1996: 506-521 | Abhinandan Das | |
| 3/13 | Decision support: OLAP Venky
      Harinarayan, Anand
      Rajaraman, Jeffrey
      D. Ullman: Implementing Data Cubes Efficiently. SIGMOD
      Conf. 1996: 205-216  | Ali Anvari | |
| 3/15 | Database Tuning Optional Reading: Data Engineering Bulletin June 1999 | Guest lecture by Philippe Bonnet | Philippe Bonnet is an expert on database tuning. | 
| 3/27 | Online aggregation Joseph M. Hellerstein, Peter
      J. Haas, Helen
      Wang: Online Aggregation. SIGMOD
      Conference 1997: 171-182 Peter
      J. Haas, Joseph M. Hellerstein: Ripple Joins for Online Aggregation. SIGMOD
      Conference 1999: 287-298 | Yanling Wang | |
| 3/29 | Parallel database systems and high-performance
      sorting David J. DeWitt, Jim
      Gray: Parallel Database Systems: The Future of High Performance
      Database Systems. CACM
      35(6): 85-98 (1992) David J. DeWitt, Shahram
      Ghandeharizadeh, Donovan
      A. Schneider, Allan
      Bricker, Hui-I
      Hsiao, Rick
      Rasmussen: The Gamma Database Machine Project. TKDE
      2(1): 44-62 (1990) The papers are on Tobias' web site. | Guest lecture by Tobias Mayr | |
| 4/3 | Data mining Rakesh
      Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association
      Rules in Large Databases. VLDB
      1994: 487-499 | Kevin Seng | |
| 4/5 | Data mining Manish
      Mehta, Rakesh Agrawal, Jorma
      Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT
      1996: 18-32 John
      C. Shafer, Rakesh Agrawal, Manish
      Mehta: SPRINT: A Scalable Parallel Classifier for Data Mining. VLDB
      1996: 544-555 | Tomi Yiu | |
| 4/10 | High-dimensional indexing Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft: When Is ''Nearest Neighbor'' Meaningful? ICDT 1999: 217-235 | Dan Kifer | Project overview due in
      class 
 | 
| 4/12 | Nearest neighbor queries Nick Roussopoulos, Stephen Kelley, Frederic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79. http://www.cs.umd.edu/~nick/papers/nnpaper.html G. R. Hjaltason and H. Samet, Distance browsing in spatial databases, ACM Transactions on Database Systems 24, 2 (June 1999), 265-318. http://www.cs.umd.edu/~grh/incnear2.ps.gz | Sung-hsun Su | |
| 4/17 | Object-relational database systems Michael Stonebraker: Inclusion of New Types in Relational Data Base
      Systems. ICDE
      1986: 262-269 Michael Stonebraker, Greg
      Kemnitz: The Postgres Next Generation Database Management System. CACM
      34(10): 78-92 (1991) | Yong Yao | Johannes out of town | 
| 4/19 | Client-server caching and Object Stores Seth J. White, David J. DeWitt: QuickStore: A High Performance Mapped Object Store. VLDB Journal 4(4): 629-673 (1995) Michael J. Franklin, Michael J. Carey: Client-Server Caching Revisited. IWDOM 1992: 57-78 | Ben Atkin | Johannes out of town | 
| 4/24 | Database systems and XML J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, B. Reinwald, "Efficiently Publishing Relational Data as XML Documents", VLDB Conference, September 2000. Click here for the slides. J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, J. Naughton, "Relational Databases for Querying XML Documents: Limitations and Opportunities," VLDB Conference, September 1999. Click here for the slides. | David Wu | |
| 4/26 | Deductive database systems Chapter 27, Ramakrishnan and Gehrke: Database Management Systems (Second Edition) | Qinghong Zhang and Jinghao Zhang | |
| 5/1 | no class; work on the projects | ||
| 5/3 | The Future of DBMS Research TBA | Johannes | 
Course project: As
part of the course, you will do a database/data mining research project in
groups of two students. Usually several of the projects will lead to publishable
results, and I encourage you to have a publication as a goal in the course. I
will provide a list of projects, and you can choose from this list, but you can
also choose your own project. 
The project will have the following milestones:
1. Survey of the relevant literature. A survey of the relevant literature
related to the project.
2. A detailed description of your project. The overview includes a description
of your algorithm and ideas, the design of the piece of software you are
proposing to build, a description of how you propose to evaluate your ideas, and
the design of the experiments that you will run.
3. Implementation and performance analysis.
4. Final project report and a short presentation to the class about your course
project.
Grading:
Your paper presentation in class: 15%
The class project: 50%
Two exams: 30% (15% each)
Class participation: 5%
2
3/6 Literature survey and draft of the project overview due in class
3/13 Handout of the take-home midterm
3/15 Midterm due in class
4/10 Project overview with detailed description of the algorithms and
design of the experimental evaluation due in class
5/11 Final project presentations and final project reports due
5/11 Handout of the final exam after the presentations (48 hours)
5/13 Handin of the final exam