Database Management Systems
Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information? -- T.S. Elliot
Logistics
- CS632, Cornell University, Spring 2007.
- Instructor: Johannes Gehrke
- Tuesdays and Thursdays 2:50-4:05pm,
Upson 109.
Overview
The course has two parts. The first part (before spring break) covers a basic background on system
abstractions and algorithmic tools for managing large datasets. The second part
(after spring break) covers current research topics in this area such as the management of structured
and unstructured data, search, and data mining. The course prerequisites are algorithms and
probability (at the level of CS482) and a basic background in systems (at the
level of CS414).
The main work for
the course consists of a paper
summary for each class (the 20 best paper summaries will count in total for
20% of your grade), short answers to questions about the papers (the 20 best
answers will count in total for 20% of your grade), the presentation of one
paper in the course (10% of your grade), and a
course project (50% of your grade).
In the project, I expect you to try to do original research. The project
encompasses the following steps:
- Project proposal with references. The proposal should contain your goals
for the project and the results of an initial literature search. The project
proposal is due March 6.
- Full literature review for the project, a formal problem description,
and a high-level discussion of your approach, due March 29..
- An intermediate status update the week of April 17. An email to Johannes
is ok.
- The final project report. The project report should be formatted like a
regular paper for a conference submission (use the ACM style). The final
project is due May 11.
Course Outline (Draft)
Note: Write your summary about the paper marked with a (*).
Part I: The Classics
Data Models
January 25
January 30:
- Michael Stonebraker and Joseph M. Hellerstein.
Anatomy of a database system.
-
Donald D. Chamberlin,
Morton M. Astrahan,
Mike W. Blasgen,
Jim Gray,
W. Frank King III,
Bruce G. Lindsay,
Raymond A. Lorie,
James W. Mehl,
Thomas G. Price,
Gianfranco R. Putzolu,
Patricia G. Selinger,
Mario Schkolnick,
Donald R. Slutz,
Irving L. Traiger,
Bradford W. Wade,
Robert A. Yost: A
History and Evaluation of System R.
Commun. ACM 24(10): 632-646(1981)
Query Optimization and Query Processing
February 1
February 6
February 8
Access Methods
February 13
- M. Stonebraker, "Operating
System Support for Database Management", CACM 24(7), pp. 412-418, 1981.
- A. Guttman, "R-Trees:
A Dynamic Index Structure for Spatial Searching", SIGMOD Conference,
1984. (*)
- J. Nievergelt, H.
Hinterberger, K. C. Sevcik, "The
Grid File: An Adaptable, Symmetric Multikey File
Structure", TODS 9(1), 1984.
Storage
February 15
One-Pass Query Processing and Sampling
February 20
February 22
Transaction Management
February 27
- J. Gray, et al., "Granularity of Locks and Degrees of Consistency in a
Shared Database", IFIP Working Conference on Modeling of Data Base
Management Systems, 1977
- P. L. Lehman, S. B. Yao, "Efficient Locking for Concurrent Operations on
B-Trees", TODS 6(4), 1981.
- H. T. Kung, J. T. Robinson, "On
Optimistic Methods for Concurrency Control", TODS 6(2), 1981. (*)
March 1
March 6
A Detour: Web Services and Databases
March 8
Transaction Management (Contd.)
March 13
- C. Mohan, B. G. Lindsay, R. Obermarck,
"Transaction Management in the R* Distributed Database Management System",
TODS 11(4), 1986.
- J. Gray, P. Helland, P. E. O'Neil, D.
Sasha, "The Dangers of Replication and a
Solution", SIGMOD Conference, 1996. (*)
Transaction Management (Contd.)
March 15
- C. Mohan, et al., "ARIES: A Transaction Recovery Method Supporting
Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging",
TODS 17(1), 1992. (No summary for this lecture.)
Part II: The Nouveaux (After Spring Break)
Decision Support and OLAP
March 27 (Guest lecture by Alan Demers)
- Adam Bosworth, Surajit Chaudhuri, Jim Gray, Andrew Layman, Frank Pellow,
Hamid Pirahesh, Don Reichart, and Murali Venkatrao.
Data
Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab,
and Sub Totals, Data Mining and Knowledge Discovery 1(1), 1997. (*)
- Y. Zhao, P. Deshpande, and J. Naughton.
An
Array-Based Algorithm for Simultaneous Multidimensional Aggregates, SIGMOD 1997.
March 29 (Guest lecture by Mirek Riedewald)
Data Mining
April 3 (Lakshmi)
April 5 (Yunsong Guo)
Deductive Database Systems
April 10 and April 12 (Parvati will
give one lecture)
Benchmarking
April 17 (Chi Ho)
- Anon, et al, "A
Measure of Transaction Processing Power", Datamation, 31(7). (*)
- M. J. Carey, D. J. DeWitt, J. F. Naughton: "The
007 Benchmark", SIGMOD Conference, 1993.
- The BUCKY
Object-Relational Database Benchmark. In SIGMOD97. This paper describes
a benchmark for object-relational database systems; follow the link to get
postscript describing the results of running the benchmark against one
object-relational database system, and also to get an SQL3-ish
implementation of the benchmark and a data generation program.
Data Stream ProcessingApril 19
(Philipp Unterbrunner)
- João Pereira, Françoise Fabret, H.-Arno Jacobesen, François Llirbat,
Radu Preotiuc-Prieto, Kenneth Ross, and Dennis Shasha.
Filtering Algorithms and
Implementation for Very Fast Publish/Subscribe systems. SIGMOD 2001.
Please make sure to read the SIGMOD 2001 paper from the website; the other
papers are good reading, but we will discuss the SIGMOD 2001 paper..
- D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M.
Stonebraker, N. Tatbul, S. Zdonik. Aurora: A New Model and Architecture for
Data Stream Management. In VLDB Journal (12)2: 120-139, August 2003.
[PDF]
(*)
April 24. (Adam Arbree)
April 26 (Guest lecture by Mingsheng Hong)
- R. Barga, J. Goldstein, M. Ali, M. Hong. "Consistent Streaming Through
Time -- A Vision for Event Stream Processing".
CIDR 2007
- A. Demers, J. Gehrke, M. Hong, M. Riedewald, W. White. "Towards
Expressive Publish/Subscribe Systems".
EDBT 2006. (*)
Main memory database systems (Last week of classes: No more summaries
needed, work on your projects instead.)
May 1 (Nitin)
Database Privacy
May 3
- Simson Garfinkel. To Know Your Future. From Database Nation, O'Reilly
Press, 2nd Edition. [HTML]
- Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthu
Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006. [PDF]