CS 734 (959-565)       Topics in Data Replication

Time: Tues+Thurs 1:25
Permanent Room: Phillips 219

Alan Demers ademers@cs.cornell.edu
4115 Upson, 255-9207
Hours: Wed after 1:00 or by appt.

Announcements.

Tues 11/28: Adrian on consensus and distributed commitment: survey of results from [GHM+00], [GS97], [G95].

Thurs 11/30: (deferred discussion of) [NFK99]

Tues 11/21:Sunny Gleason on [SAS+95] (Data Replication in Mariposa).

Thurs 11/23: Thanksgiving

Tues 11/14: Indranil on [AAF+95] (broadcast disk stuff).

Thurs 11/16: Kate on [SNR+99] (concurrency control for read-write transactions in a broadcast data base).

Some suggested future subjects with references:

Ben Noble et al., "Fluid Replication" work.  Fluid Replication is basically dynamic creation and destruction of replicas when and where appropriate, based on usage patterns.  References are [NFK99] (A Case for Fluid Replication) and [NFC00] (Deferring Trust in Fluid Replication).  Nice to relate this to the (no longer extant) Mariposa project [Mariposa] -- we might even want to present some of that material.

A sojourn into the other kind of broadcast protocols -- protocols optimized for mobile clients on wireless networks.  There are two new concerns.  First, the channel is likely to be asymmetric, with a much higher data rate on the down link than on the up link.  Second, energy consumption of a mobile device is an important parameter.  These concerns lead to procols that minimize data sent by clients, in exchange for repeatedly broadcasting the entire contents of the database.  A seminal reference is the Datacycle Architecture paper [HGL+87].  This is an old paper, based on somewhat obsolete technological assumptions; probably not worth presenting but an interesting piece of bedtime reading.  Work on "Broadcast Disks" structures the broadcasts so more frequently referenced data is broadcast more often.  References include [AAF+95], [AFZ96] and [AFZb96]; some subset of these papers should be presented.   More recent work addresses consistency and concurrency control for read-only transactions ([PC99]) and read-write transactions ([SNR+99]).  Each of these papers could be presented.  There is also a (huge) TODS paper [DVCK99] (Broadcast Protocols to Support Efficient Retrieval from Databases by Mobile Users) that I haven't read yet, but it looks like it could be interesting.

 

Tu 11/07 I'll present [YV00] (Yu and Vahdat, Design and Evaluation of a Continuous Consistency Model for Replicated Services), and a bit of [YVb00] (Efficient Numerical Error Bounding ...).  At a high level this work looks like an epsilon-serializability-like system in a Bayou-like system in which applications always provide total conflict resolution procedures.  The numerical error bounding is for increment- and decrement- updates.

Thu 11/02 We'll cover [OW99], "... Precision-Performance Tradeoff for Aggregation Querieths Over Replicated Data"  Note, by e way, they really mean "cached" data :-)

Thu 10/19: Preview of epsilon-serializability.  For next time we'll do [RP94].  Other refs on this material are [PH+94] and [PL91].

Tues 10/17:  Following up on the Bayou "disconnected operation" theme, I'll talk about Isolation-Only transactions.  References: [LS94] (not enough detail), [LS95] (not as good as the previous one, I think), and sections of [L96] (Lu's thesis; definitive but vastly too long).

Thurs 10/12: Cancelled, I am out of town.

Thurs 10/5: The Bayou "Session Guarantees" paper [TDP+94].

Tues 10/3: Chang Won Choi will present The MIT "Lazy Replication" paper [LLSG92].

Tues 9/26: On rereading [DGH+87] I think I was a bit harsh to say below that this paper is not worth a class discussion, so I plan to discuss it.

Thurs 9/21: The next topic is the gossip-style protocols generally termed epidemic algorithms or "anti-entropy".  Here's a list of subtopics with references attached for class presentation:

[DGH+87] The "Clearinghouse paper" is where I started in this area.  Defines epidemic algorithms, the application is a name server so 1SR is really overkill, eventual convergence is a perfectly adequate notion of consistency.  This is a really old paper now, but interesting to re-read.  Worth only part of a class discussion, I think.

[G92] Richard Golding's thesis, develops the timestamped anti-entropy (TSAE) protocol assuming crash- recovery (Ch 5), and develops membership protocols that relax the crash-recovery assumption (Ch 6).

[LLSG92] The MIT "Lazy Replication" paper, which is quite widely referenced.   Uses essentially TSAE, introduces notion of client-specified consistency requirements similar to those supported in ISIS for group communication.  See also the Eventual Serializability paper [FGL+96], which is basically a more formal presentation of the Lazy Replication work.

Bayou [PST+97] represents a different approach to client-specified weak consistency; it also includes slightly protocols for site creation and retirement that are slightly different from Golding's.

[RGK95] Scalable Update Propagation in Epidemic Replicated Databases takes a view similar to the Clearinghouse paper, that an epidemic algorithm is used to compare and resolve timestamped data values rather than a log of operations.  The paper develops efficient techniques for resolving differences. 

Tues 9/19:  I want to defer relative serializability for a bit; instead I'll hand out a sketch of two ways to approach Ken's suggestion of distributing the certification work among the sites, then proceed to a discussion of the "Epidemic Quorums" paper [HSAE99] -- I originally intended to defer this paper until discussion of epidemic algorithms, but it is a 1SR technique that seems to fit better here.

Thurs 9/14: I'll discuss the reordering certification test from [P99], then some issues related to crash recovery.  Then, assuming there is time, start relative serializability [ABE+93] with the goal of adapting the database state machine approach to it.

Tues 9/12: Okay, since I'm still so far behind with the outline, I'm going to start the "exploiting atomic broadcast" subject myself.  Tuesday will cover [PGS98b] or, equivalently, [P99] through Section 3.4.

Thurs 9/14: I'll discuss the reordering certification test from [P99], then some issues related to crash recovery.  Then, assuming there is time, start relative serializability [ABE+93] with the goal of adapting the database state machine approach to it.

Outline

Raw Bibliography -- every paper we mention will appear here.

Overview

Replication is an important topic in databases, file systems and distributed systems.  From a database perspective, replication can be viewed simply as a technique to provide increased query performance and availability in distributed and "multi-database" systems.  From a distributed system perspective, the need to maintain a consistent copy of the state at each process can profitably be viewed as a data replication problem.  We will discuss techniques developed from both perspectives in a uniform framework, with particular emphasis on topics related to scalability and availability, including weak and probabilistic consistency models, and the use of asynchronous and randomized propagation protocols.

 

I expect to do something under half the lectures myself.  The other lectures will be presentations of selected papers by class participants.  Once we begin scheduling these presentations, everyone will be expected to have read the scheduled paper before lecture, and to be prepared for a sensible discussion.  Much of the (S/U) grade will be based on the quality of presentations and participation in discussions.

 

A goal of the course is to produce a website that can be a resource for others interested in the area of replication.  (This is in addition to the standard course website, which will be appearing shortly.)  It should include a fairly extensive annotated bibliography, plus original material giving a unified treatment of the course content, at a level and polish somewhere between typical course notes and a Computing Surveys paper.  I would like this to be a collective effort; it will also contribute to your (S/U) course grade, and perhaps fame and self-esteem as well.

 

Topics: Here is an unordered list of relevant topics.  I expect  to visit most of this material.  The first couple of lectures I will be doing overview and background material, based on your responses to the survey below.  By Thursday the course website will have appeared, with links to papers on most of these topics.

  1. crash / failure models; recovery, reintegration
  2. taxonomy: master vs group ownership; eager vs lazy propagation
  3. eager commit protocols using atomic broadcast
  4. eager commit protocols using epidemic algorithms
  5. serializable lazy-master schemes using data placement
  6. semantic consistency schemes
  7. disconnected operation
  8. lazy-group schemes with reconciliation; eventual consistency
  9. gossip protocols and probabilistic guarantees
  10. age distributions, bounds on aggregate query results, …