4/13: The final
presentations are on 5/11/2001 in the systems lab, starting at 2pm. Prepare a 15 minute
presentation about your project. Your final
project report is due at that time as well. The final exam (48 hours) will be
handed out after the presentations.
2/15: Remember that we moved the start
time of the class to 2:50pm due to the colloquia directly after class.
2/5: The reading list has been nearly
completed.
2/5: One project has been taken (see the list
of projects). If you have any questions about the projects, feel welcome to
stop by this afternoon to talk with me.
1/30: The readings can only be accessed
from within the CUCS firewall due to copyright issues.
1/28: A project timeline has been added at
the bottom of this page.
1/28: The list of project suggestions is available online.
If you decide on a project, send me email, and I will send you a list of related
work. You can do a project by yourself, or you can work in a group of students.
I will stay after class on Tuesday, so that we can talk about projects if you
have any questions.
1/23: You can sign up for your presentation date by sending
email to johannes@cs.cornell.edu
Class: TR 2:50-4:05, Thurston 203
Instructor
Johannes Gehrke
4108 Upson Hall; johannes@cs.cornell.edu
Office hours: Wednesdays 2:30-3:30pm, or by appointment.
Teaching Assistant
Alexandre
Evfimievski
5136 Upson Hall; aevf@cs.cornell.edu;
Office hours: Mondays 2:30-3:30pm, or by appointment
CS632 covers on-going trends in database systems, focusing on database fundamentals, but also illustrating new research directions and current problems. The course pre-requisite is CS 432 or equivalent introductory background material in database systems. We will study a collection of papers, some of these are "classics" from the early days of relational databases, others are more recent papers that have influenced the field. At the end of the course, you will have an in-depth understanding of the state of the art in database systems.
Date | Topic and Paper | Presenter | Notes |
1/23 | Introduction to the course; review of some CS432
material
Slides on relational algebra: PDF, PDF (6 slides per page) Slides on normalization: PDF, PDF (6 slides per page) |
Johannes | |
1/25 | Buffer management
Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. VLDB 1985: 127-141 Join Processing Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. TODS 11(3): 239-264(1986) |
Johannes | |
1/30 | Query optimization
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 Yannis E. Ioannidis, Younkyung Cha Kang: Randomized Algorithms for Optimizing Large Join Queries. SIGMOD Conference 1990: 312-321 |
Johannes | |
2/1 | Selectivity estimation
"Improved Histograms for Selectivity Estimation of Range Predicates", by V. Poosala, Y.E. Ioannidis, P. J. Haas, and E. Shekita, Proc. of the 1996 ACM SIGMOD Conference, Montreal, Canada, June 1996. "Balancing Histogram Optimality and Practicality for Query Result Size Estimation", by Y.E. Ioannidis and V. Poosala, Proc. of the 1995 ACM SIGMOD Conference, San Jose, CA, May 1995, pages 233-244. |
Johannes | |
2/6 | Concurrency control
Rakesh Agrawal, Michael J. Carey, Miron Livny: Models for Studying Concurrency Control Performance: Alternatives and Implications. SIGMOD Conference 1985: 108-121 H. T. Kung, John T. Robinson: On Optimistic Methods for Concurrency Control. TODS 6(2): 213-226(1981) Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil: A Critique of ANSI SQL Isolation Levels. SIGMOD Conference 1995: 1-10 |
Nate Nystrom | |
2/8 | Recovery
Theo H�rder, Andreas Reuter: Principles of Transaction-Oriented Database Recovery. Computing Surveys 15(4): 287-317(1983) C. Mohan, Donald J. Haderle, Bruce G. Lindsay, Hamid Pirahesh, Peter Schwarz: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. TODS 17(1): 94-162(1992) |
Cristian Bucila | |
2/13 | Distributed database systems
Lothar F. Mackert, Guy
M. Lohman: R*
Optimizer Validation and Performance Evaluation for Local Queries. SIGMOD
Conference 1986: 84-95 |
Michael Morse | |
2/15 | Distributed query processing C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. TODS 11(4): 378-396 (1986) The following two papers are on the Garlic
Project Home Page. Laura M. Haas, Donald
Kossmann, Edward
L. Wimmers, Jun
Yang: Optimizing Queries Across Diverse Data Sources. VLDB
1997: 276-285 |
Ashish Motivala | |
2/20 | Replication (and its dangers)
Jim Gray, Pat Helland, Patrick E. O'Neil, Dennis Shasha: The Dangers of Replication and a Solution. SIGMOD Conf. 1996: 173-182 |
Guest lecture by Professor Al Demers. | |
2/22 | Thor
Providing Persistent Objects in Distributed Systems. Proceedings of the 13th European Conference on Object-Oriented Programming (ECOOP '99), Lisbon, Portugal, June 1999. [PDF] Barbara Liskov, Miguel Castro, Liuba Shrira, Atul Adya HAC: Hybrid Adaptive Caching for Distributed Storage Systems. Proceedings of the ACM Symposium on Operating System Principles (SOSP '97), Saint Malo, France, October 1997. [PDF] Miguel Castro, Atul Adya, Barbara Liskov, Andrew C. Myers |
Guest lecture by Andrew Myers | |
2/27 | Main-memory database systems: Dali
The Architecture of the Dali Storage Manager. A recent summary of Dali's architecture, features, and research contributions. This is the best overview of the system as it is today. Published in the Journal of Multimedia Tools and Applications, 4/2, 1997. Tobin J. Lehman, Michael J. Carey: A Study of Index Structures for Main Memory Database Management Systems. VLDB 1986: 294-303 |
Adina Costea | |
3/1 | Main-memory database systems
``Cache Conscious Indexing for Decision-Support in Main Memory'' J. Rao and K. A. Ross, Proceedings of the 1999 VLDB Conference, September 1999. postscript ``Making B+-Trees Cache Conscious in Main Memory'' J. Rao and K. A. Ross, Proceedings of the 2000 SIGMOD Conference, May 2000. postscript Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture Optimized for the New Bottleneck: Memory Access. VLDB 1999: 54-65 |
Guest lecture by Zhiyuan Chen | |
3/6 | Decision support: Bitmap indices Patrick
E. O'Neil, Dallan
Quass: Improved
Query Performance with Variant Indexes. SIGMOD
Conference 1997: 38-49 |
Rohit Ananthakrishna
Rohit's disk crashed and the slides are lost. |
|
3/8 | Decision support: Computing the cube
Sameet
Agarwal, Rakesh
Agrawal, Prasad
Deshpande, Ashish
Gupta, Jeffrey
F. Naughton, Raghu
Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional
Aggregates. VLDB
1996: 506-521 |
Abhinandan Das | |
3/13 | Decision support: OLAP Venky
Harinarayan, Anand
Rajaraman, Jeffrey
D. Ullman: Implementing Data Cubes Efficiently. SIGMOD
Conf. 1996: 205-216 |
Ali Anvari | |
3/15 | Database Tuning
Optional Reading: Data Engineering Bulletin June 1999 |
Guest lecture by Philippe Bonnet | Philippe Bonnet is an expert on database tuning. |
3/27 | Online aggregation Joseph M. Hellerstein, Peter
J. Haas, Helen
Wang: Online Aggregation. SIGMOD
Conference 1997: 171-182 Peter
J. Haas, Joseph M. Hellerstein: Ripple Joins for Online Aggregation. SIGMOD
Conference 1999: 287-298 |
Yanling Wang | |
3/29 |
Parallel database systems and high-performance
sorting David J. DeWitt, Jim
Gray: Parallel Database Systems: The Future of High Performance
Database Systems. CACM
35(6): 85-98 (1992) David J. DeWitt, Shahram
Ghandeharizadeh, Donovan
A. Schneider, Allan
Bricker, Hui-I
Hsiao, Rick
Rasmussen: The Gamma Database Machine Project. TKDE
2(1): 44-62 (1990) The papers are on Tobias' web site. |
Guest lecture by Tobias Mayr | |
4/3 | Data mining Rakesh
Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association
Rules in Large Databases. VLDB
1994: 487-499 |
Kevin Seng | |
4/5 | Data mining Manish
Mehta, Rakesh Agrawal, Jorma
Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT
1996: 18-32 John
C. Shafer, Rakesh Agrawal, Manish
Mehta: SPRINT: A Scalable Parallel Classifier for Data Mining. VLDB
1996: 544-555 |
Tomi Yiu | |
4/10 | High-dimensional indexing
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft: When Is ''Nearest Neighbor'' Meaningful? ICDT 1999: 217-235 |
Dan Kifer | Project overview due in
class
|
4/12 | Nearest neighbor queries
Nick Roussopoulos, Stephen Kelley, Frederic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79. http://www.cs.umd.edu/~nick/papers/nnpaper.html G. R. Hjaltason and H. Samet, Distance browsing in spatial databases, ACM Transactions on Database Systems 24, 2 (June 1999), 265-318. http://www.cs.umd.edu/~grh/incnear2.ps.gz |
Sung-hsun Su | |
4/17 | Object-relational database systems Michael Stonebraker: Inclusion of New Types in Relational Data Base
Systems. ICDE
1986: 262-269 Michael Stonebraker, Greg
Kemnitz: The Postgres Next Generation Database Management System. CACM
34(10): 78-92 (1991) |
Yong Yao | Johannes out of town |
4/19 | Client-server caching and Object Stores
Seth J. White, David J. DeWitt: QuickStore: A High Performance Mapped Object Store. VLDB Journal 4(4): 629-673 (1995) Michael J. Franklin, Michael J. Carey: Client-Server Caching Revisited. IWDOM 1992: 57-78 |
Ben Atkin | Johannes out of town |
4/24 | Database systems and XML
J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, B. Reinwald, "Efficiently Publishing Relational Data as XML Documents", VLDB Conference, September 2000. Click here for the slides. J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, J. Naughton, "Relational Databases for Querying XML Documents: Limitations and Opportunities," VLDB Conference, September 1999. Click here for the slides. |
David Wu | |
4/26 | Deductive database systems
Chapter 27, Ramakrishnan and Gehrke: Database Management Systems (Second Edition) |
Qinghong Zhang and Jinghao Zhang | |
5/1 | no class; work on the projects | ||
5/3 | The Future of DBMS Research
TBA |
Johannes |
Course project: As
part of the course, you will do a database/data mining research project in
groups of two students. Usually several of the projects will lead to publishable
results, and I encourage you to have a publication as a goal in the course. I
will provide a list of projects, and you can choose from this list, but you can
also choose your own project.
The project will have the following milestones:
1. Survey of the relevant literature. A survey of the relevant literature
related to the project.
2. A detailed description of your project. The overview includes a description
of your algorithm and ideas, the design of the piece of software you are
proposing to build, a description of how you propose to evaluate your ideas, and
the design of the experiments that you will run.
3. Implementation and performance analysis.
4. Final project report and a short presentation to the class about your course
project.
Grading:
Your paper presentation in class: 15%
The class project: 50%
Two exams: 30% (15% each)
Class participation: 5%
2
3/6 Literature survey and draft of the project overview due in class
3/13 Handout of the take-home midterm
3/15 Midterm due in class
4/10 Project overview with detailed description of the algorithms and
design of the experimental evaluation due in class
5/11 Final project presentations and final project reports due
5/11 Handout of the final exam after the presentations (48 hours)
5/13 Handin of the final exam