CS632 Course Timeline

Project Timeline

Date

Deliverable

Feb 10

Project decisions are made

Mar 1

Literature Survey due at 8:00am

Mar 10

Project proposal due at 8:00am

May 6

Final project report due at 8:00am

Class Timeline

Note that you will have to submit written comments for two papers. The paper for which you have to propose extensions is marked with a (*). Papers that you could choose to present are marked (Student).

Date

Topic and Paper

Presenter

 Jan 27

Introduction to the course

Course overview, history of database systems.

Introductory Course Readings (read by Monday, February 3): The Relational Model and Prototypes

E. F. Codd, "A Relational Model of Data for Large Shared Data Banks", CACM 13(6), 1970.

M. Astrahan, et al., "System R: Relational Approach to Database Management", TODS 1(2), 1976.

M. Stonebraker, et al., "The Design and Implementation of INGRES", TODS 1(3), 1976.

Johannes

 Jan 29

Database Systems and Operating Systems

M. Stonebraker, "Operating System Support for Database Management", CACM 24(7), pp. 412-418, 1981.

Buffer Management

H. T. Chou, D. J. DeWitt, "An Evaluation of Buffer Management Strategies for Relational Database Systems", VLDB Conference, 1985.

Professor Jai
Shanmugasundaram

 Feb 1

Short discussion of introductory readings from the January 27 lecture. (Discussion will be lead by Johannes.) Readings for this lecture:

Index Structures

A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching", SIGMOD Conference, 1984. (*)

J. Nievergelt, H. Hinterberger, K. C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure", TODS 9(1), 1984.

Optional reading:

D. Comer, "The Ubiquitous B-Tree", Computing Surveys, 1979.

Johannes

 Feb 3

Join Processing

L. D. Shapiro, "Join Processing in Database Systems with Large Main Memories", TODS 11(3), 1986. (*)

Query Optimization

P. G. Selinger, et al., "Access Path Selection in a Relational Database Management System", SIGMOD Conference, 1979.

Professor Al Demers

Feb 8

No Class

 Feb 10

Selectivity Estimation

V. Poosala, Y.E. Ioannidis, P. J. Haas, E. Shekita, "Improved Histograms for Selectivity Estimation of Range Predicates", SIGMOD Conference, 1996. (*)

Query Rewrite

T. Y. Cliff Leung, H. Pirahesh, P. Seshadri, J. M. Hellerstein, "Query Rewrite Optimization Rules in IBM DB2 Universal Database", IBM Research Report, IBM RJ10103.

Optional readings:

Nicolas Bruno, Surajit Chaudhuri, Luis Gravano: STHoles: A Multidimensional Workload-Aware Histogram. SIGMOD Conference 2001

Yannis E. Ioannidis: The History of Histograms (abridged). VLDB 2003: 19-30

Johannes

Feb 15

Concurrency Control

J. Gray, et al., "Granularity of Locks and Degrees of Consistency in a Shared Database", IFIP Working Conference on Modeling of Data Base Management Systems, 1977. In Hellerstein/Stonebraker textbook.

P. L. Lehman, S. B. Yao, "Efficient Locking for Concurrent Operations on B-Trees", TODS 6(4), 1981. In Hellerstein/Stonebraker textbook. (*)

Johannes

Feb 17

Concurrency Control (Continued)

H. T. Kung, J. T. Robinson, "On Optimistic Methods for Concurrency Control", TODS 6(2), 1981. (*)

R. Agrawal, M. J. Carey, M. Livny, "Concurrency Control Performance Modeling: Alternatives and Implications", TODS 12(4), 1987. In Hellerstein/Stonebraker textbook. (Student)

Johannes

Feb 22

Recovery

T. Harder, A. Reuter, "Principles of Transaction-Oriented Database Recovery", Computing Surveys 15(4), 1983.

C. Mohan, et al., "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging", TODS 17(1), 1992. In Hellerstein/Stonebraker textbook.

For this class, you do not have to propose extensions and/or weaknesses to the two papers (but you still have to give your summaries with contributions). Instead of weaknesses and extensions, please answer the following questions in your comments.

For the Haerder/Reuter paper:

  • Similar to the examples in Section 3.4, discuss the following implementation technique: Not ATOMIC, STEAL, not FORCE, FUZZY

For the Mohan et al paper:

  • Do we need the RecLSN for the correctness of the algorithm? If so, why? If not, why not, and why is it included in the data structures in the paper?
  • Assume that we have a log record that spans two pages: The REDO information takes one page, and the UNDO information takes a second page. When you write these two records, will you first write the REDO part or first the UNDO part and why?

Johannes

Feb 24

Extended Transaction Models

H. Garcia-Molina, K. Salem, "SAGAS", SIGMOD Conference, 1987. (*) (Lucja Kot)

H. Wachter, A. Reuter, "The ConTract Model", Database Transaction Models for Advanced Applications (A. Elmagarmid, ed.), 1992.  (Johannes)

For this class, you do have to propose extensions and/or weaknesses only for the ConTract paper (but you still have to give your summaries with contributions for both papers). For the SAGAS paper, instead of weaknesses and extensions, please answer the following question in your comments.

Question:

The authors list several difficulties that arise when sagas are extended to their parallel execution model. Suppose that we introduce a new, more general notion of a parallel saga; we will allow for sagas to mutually influence each other's behavior (perhaps by message passing), rather than having a one-sided dependency between a parent and a child. Suppose we also enforce synchronized savepoints for such parallel transactions: if S1 and S2 can influence each other's behavior, we cannot take a savepoint in S1 without taking a savepoint in S2 at the exact same time. All other aspects of execution are as in the standard saga model.

  • In this new model of parallel execution, which of the disadvantages listed for the parent/child model will still apply, and which will not? Explain. (If answering this question requires making a further design choice in the execution model, explain that choice and how it affects the answer.)

Lucja Kot, Johannes

March 1

Distributed Database Systems

R. Williams, et al., "R*: An Overview of the Architecture", IBM Research Report RJ3325. (*)

L. F. Mackert, G. M. Lohman, "R* Optimizer Validation and Performance Evaluation for Distributed Queries", VLDB Conference, 1986.

Johannes

March 3

Distributed Transaction Management and Replication

C. Mohan, B. G. Lindsay, R. Obermarck, "Transaction Management in the R* Distributed Database Management System", TODS 11(4), 1986. In Hellerstein/Stonebraker textbook. (*) (David Martin)

J. Gray, P. Helland, P. E. O'Neil, D. Sasha, "The Dangers of Replication and a Solution", SIGMOD Conference, 1996. In Hellerstein/Stonebraker textbook. (Johannes)

For this class, you do have to propose extensions and/or weaknesses only for the Replication paper (but you still have to give your summaries with contributions for both papers). For the Transaction management paper, instead of weaknesses and extensions, please answer the following question in your comments.

Questions:

  1. The paper states that, in order to maintain atomicity, each subordinate S must make sure (by forcing log records) never to ask the coordinator about a decision that S has already ACKed. For each protocol (2P, PA, PC), give a scenario where failure to follow this principle breaks atomicity. (Assume that the coordinator and all other subordinates follow the protocol strictly.)
  2. In Presumed Commit (PC), the root’s commit record is forced even though all other nodes do not force their commit records. Why?

David Martin, Johannes

March 8

Wide-Area Distributed and Parallel Database Systems

M. Stonebraker, et al., "Mariposa: A Wide-Area Distributed Database System", VLDB Journal, 1996. (*) (Huhn-Kie Lee)

D. J. DeWitt, J. Gray, "Parallel Database Systems: The Future of High Performance Database Systems", CACM 35(6), 1992. In Hellerstein/Stonebraker textbook. (Johannes)

Huhn-Kie Lee, Johannes

March 10

Parallel Database Systems

Goetz Graefe: Encapsulation of Parallelism in the Volcano Query Processing System. SIGMOD Conference 1990: 102-111. In Hellerstein/Stonebraker textbook. (Johannes)

Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, David B. Lomet: AlphaSort: A Cache-Sensitive Parallel External Sort. VLDB Journal 4(4): 603-627 (1995) (In Hellerstein/Stonebraker textbook.) (*) (Vikram)

Optional readings:

D. DeWitt, et al. "The Gamma Database Machine Project", TKDE 2(1), 1990.

Krishnaprasad Vikram,
Johannes

March 15

One-Pass Computations

J. Ian Munro, Mike Paterson: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12: 315-323(1980)  (Johannes)

Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Approximate Medians and other Quantiles in One Pass and with Limited Memory. SIGMOD Conference 1998: 426-435. (*) (Ian Kash)

Ian Kash, Johannes

March 17

Sampling

 

Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) (Johannes)

 

Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: On Random Sampling over Joins. SIGMOD Conference 1999: 263-274. (*) (Radu Popovici)

Radu Popovici, Johannes

March 22

Spring break – no class.

 

March 24

Spring break – no class.

 

March 29

Online Aggregation

Joseph M. Hellerstein, Peter J. Haas, Helen Wang: Online Aggregation. SIGMOD Conference 1997: 171-182 (*) (Thanh)

 

Peter J. Haas, Joseph M. Hellerstein: Ripple Joins for Online Aggregation. SIGMOD Conference 1999: 287-298 (Muthu Venkitasubramaniam)

 

Optional readings:

 

Vijayshankar Raman, Bhaskaran Raman, Joseph M. Hellerstein: Online Dynamic Reordering for Interactive Data Processing. VLDB 1999: 709-720

Muthu Venkitasubramaniam,
Thanh

March 31

Objects and Databases

Lamb, et al., "The ObjectStore Database System", CACM 34(10), 1991. (*) (Yejin Choi)

M. Franklin, M. J. Carey, "Client-Server Caching Revisited", International Workshop on Distributed Object Management, 1992. (Johannes)

Yejin Choi, Johannes

April 5

Object-Relational Database Systems

M. Stonebraker, "Inclusion of New Types in Relational Data Base", ICDE Conference, 1986. (*)

M. Stonebraker, G. Kemnitz, "The POSTGRES Next-Generation Database Management System", CACM 34(10), 1991. (Lucian)

Lucian, Johannes

April 7 Benchmarks

Anon, et al, "A Measure of Transaction Processing Power", Datamation, 31(7).

M. J. Carey, D. J. DeWitt, J. F. Naughton: "The 007 Benchmark", SIGMOD Conference, 1993. (Yonatan Ben-Simhon)

The BUCKY Object-Relational Database Benchmark (with Michael Carey, David DeWitt, Johannes Gehrke, Dhaval Shah, and Mohammed Asgarian). In SIGMOD97. This paper describes a benchmark for object-relational database systems; follow the link to get postscript describing the results of running the benchmark against one object-relational database system, and also to get an SQL3-ish implementation of the benchmark and a data generation program. (Student)

Yonatan, Johannes
April 12 Deductive Database Systems

F. Bancilhon, R. Ramakrishnan, "An Amateur's Introduction to Recursive Query Processing Strategies", SIGMOD Conference, 1986.

Raghu Ramakrishnan, Johannes Gehrke. Database Management Systems, 3rd edition, chapter 25. (Xin)

Xin, Johannes
April 14 Active Database Systems and Triggers

D. R. McCarthy, U. Dayal, "The Architecture of an Active Data Base Management System", SIGMOD Conference, 1989.

S. Ceri, J. Widom, "Deriving Production Rules for Constraint Maintenance", VLDB Conference, 1990. (*)

Guest lecture by Mingsheng Hong
April 19 Main Memory Database Systems

P. Bohannon, et al., "The Architecture of the Dali Storage Manager", Journal of Multimedia Tools and Applications 4(2), 1997.

J. Rao, K. A. Ross, "Making B+-Trees Cache Conscious in Main Memory", SIGMOD Conference, 2000. (Ruijie Wang, *)

Ruijie
April 21 Decision Support

Optional:

J. Gray, et al., "Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals", Data Mining and Knowledge Discovery 1(1), 1997.

Required:

P. O'Neill, D. Quass, "Improved Query Performance with Variant Indexes" SIGMOD Conference, 1997. (Gun)

Y. Zhao, P. Deshpande, J. Naughton, "An Array-Based Algorithm for Simultaneous Multidimensional Aggregates", SIGMOD Conference, 1997. (Daria, *)

Daria, Gun
April 26 Semistructured Data

J. McHugh, S. Abiteboul, R. Goldman, D. Quass, J. Widom, "Lore: A Database Management System for Semistructured Data", SIGMOD Record 26(3), 1997.

J. Shanmugasundaram, et al., "Relational Databases for Querying XML Documents: Limitations and Opportunities", VLDB Conference, 1999. (Kamal, *)

Kamal
April 28 Data Privacy

K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakrishnan, Y. Xu, and D. DeWitt, "Limiting Disclosure in Hippocratic Databases." In Proceedings of the 30th International Conference on Very Large Databases, August 2004. (pdf) (Muthu)

K. LeFevre, D. DeWitt, and R. Ramakrishnan. "Incognito: Efficient Full-Domain k-Anonymity." To appear in ACM SIGMOD International Conference on Management of Data, June 2005. (Daria, *)

Muthu

Daria

May 3 Data Mining (no more summaries needed, work on the projects instead)

Association rules and sequential patterns (Thomas Finley):

Constraints (you can read the short version of the paper):

Thomas Finley,

Mohammed

May 5 Data Stream Processing (no more summaries needed, work on the projects instead)

João Pereira, Françoise Fabret, H.-Arno Jacobesen, François Llirbat, Radu Preotiuc-Prieto, Kenneth Ross, and Dennis Shasha Le Subscribe: Publish and Subscribe on the Web at Extreme Speed. SIGMOD 2001. (Zoya, *)

D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. In VLDB Journal (12)2: 120-139, August 2003. [PDF] (Helgi)

Zoya, Helgi