CloudsKen

 

Ken Birman   

ken@cs.cornell.edu   

office: Upson 4119B   
 (607) 255-9199   

The following starts with an actual reading assignment for the next class or two (I'll try to get to the point of doing it a week ahead of time), and then below you;ll see a very rough list of topics we might cover in an order that might make sense.  Each week we'll be reading a few papers associated with these topics. 

Your only homework will be to hand in a once-per-week essay (well written, but short: 1 or 2 pages) on the main topic for that week.  Hand in your homework via email to ken@cs.cornell.edu by 11:59pm on the due date. 

Date Please read
Jan. 26 Distributed Snapshots: Determining Global States of Distributed Systems.  ACM TOCS 3:1, Feb 1986.  Additional useful readings: Time, Clocks and the Ordering of Events in Distributed Systems.Vector Clocks
1 or 2 page Essay due Feb 2 Would it be possible to design a consistent-cut based definition of consistency and an associated implementation that would be practical and useful, would work in cloud-scale settings, and definitely wouldn't disrupt cloud computing data centers in significant ways?
Jan. 31 State Machine Replication: A Tutorial.
Feb. 2 Paxos, See also Robbert van Renesse and Deniz Altinbuken. Paxos Made Moderately Complex. ACM Computing Surveys. Vol. 47. No. 3. February 2015.  You might also read about Chubby, the Google locking service
1 or 2 page essay due Feb 9 Does State Machine Replication scale? (Or, if you prefer, do services constructed using Paxos scale?).  Things you could think about (not all of them, though: your paper will be too long... and feel free to interpret the question in other ways if you think this question misses the key point).  Scale in the number of services running (e.g. if you have N 3-replicated services); scalability in the number of replicas within a service (N), scalability in load (can Paxos "amortize" work?)
Feb. 7 History of Virtual Synchrony.  .
Feb 9 Dynamically Reconfigurable Services
1 or 2 page essay due Feb 16 Pick one, write about it: The center of the virtual synchrony scalability argument is that the protocols use a single IP multicast for most operations (S of them, if it takes S multicasts to hold the data).  (1) Is it plausible to imagine a data center in which all replication "events" (updates to data, locks, etc) map to a single IP multicast each, plus a little minor background overhead, or is this unrealistic?  Why?  (2) Some virtual synchrony multicast "flavors" relax durability; Paxos never does this.  Who's right and what are the scalability implications of the right answer?  (If you think Ken is wrong, just say so.  He doesn't get annoyed easily!)
Feb 14 The CAP Conjecture and some random guy's blog on what Brewer said (in this random guy's favor, he does seem to understand the topic in some depth).
Feb 16
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Seth Gilbert, Nancy Lynch.  SIGACT News, Volume 33 Issue 2.  June 2002 .  Atomic broadcast: from simple message diffusion to Byzantine agreement. Flaviu Cristian, Houtan Aghili, Ray Strong, Danny Dolev April 1995 Information and Computation, Volume 118 Issue 1.
1 or 2 page essay due Feb 23 Most cloud-computing platforms claim to offer "convergent consistency".  Suppose that you were developing the architecture for a new high-assurance application to do cloud-hosted air-traffic control, and that tracks locations of planes, instructions given to pilots, flight plans, etc.  How might you use such a property? 

Something to consider: convergent consistency is rarely defined and never really formalized; the t-consistency model in the CAP paper is unusual in that sense, as are the probability(1) approaches mentioned in class (namely that as time goes by the probability of the system being consistent with respect to old updates rises towards 1.0 exponentially quickly).  So you may want to start by expressing what you believe such a property should guarantee (if you consider it to be at all meaningful; if not, explain).  Keep in mind that ATC systems are "fail safe": they can fail; there simply needs to be a safe fallback.  So don't try and require the system to do the impossible: no technology is perfect and an all-or-nothing mentality won't win you any prizes as chief system architect!

Note: I've added a link to the CASD paper, above.
Feb 21 Impossibility of distributed consensus with one faulty process.  Mike Fischer, Nancy Lynch, Mike Paterson.  JACM 32:2 (1985).  Effectively Nonblocking Consensus Procedures Can Execute Forever – a Constructive Version of FLP.  Bob Constable, July 2008.  Unreliable failure detectors for reliable distributed systems. Tushar Chandra and Sam Toueg.  JACM 43(2):225–267, 1996.  If interested in failure detectors, Ken recommends that you also read the very clear Wiki page by Maurice Herlihy here.  Our lecture will survey a broad range of thinking and for that reason, we won't be doing all the proofs in class (nor could we: it would take weeks!)  But do try to understand either the basic FLP proof or Bob Constable's very elegant restatement of FLP in a constructive logic framework, which is more clear in several ways.  In the Chandra/Toueg paper, we'll be looking most closely at the actual consensus protocol they present (the one using a rotating token).  You can read a Wiki summary of explaining the basic ideas here.
Feb. 23 Just in case the Monday lecture leaves your neurons smoldering, we'll shift to two more practical papers about making progress under conditions that can partition a system: Increasing the resilience of atomic commit, at no additional cost.  Idit Keidar, Danny Dolev.  PODC 1995.  Providing high availability using lazy replication Rivka Ladin, Barbara Liskov, Liuba Shrira, Sanjay Ghemawat.  ACM TOCS 10:4 (1992).
  You deserve a break!  No essay due this week.  Going forward we'll shift to one every two weeks.

Also, in general, let Ken know if one of our essay deadlines conflicts with a paper deadline or some other real obligation.  He'll negotiate a way around this; better to have a good essay late (or none at all) than a rush job.
Feb 28 The dangers of replication and a solution Jim Gray, Pat Helland, Patrick O'Neil, Dennis Shasha.  Some random readings about scalability in Google and other cloud platforms.
Mar. 2. BASE: An Acid Alternative by Dan Pritchett July 28, 2008.  Eventually Consistent - Revisited.  Werner Vogels. Dec 2008.  
1 or 2 page essay due Mar. 14 Gossip protocols are very good for some purposes, but less so for others.  Identify two basic "operations", one highlighting the power of gossip and one highlighting a foundational weakness of gossip.  In your essay devote a few paragraphs to each.  Then conclude by posing "my-name's conjecture" (fill in your name) concerning the separation between gossip and stronger consistency models.  (You should be inspired by CAP, of course!)
Mar. 7 Epidemic algorithms for replicated database maintenance. Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, Doug Terry.  ACM PODC, Aug. 1987.  Experience with Grapevine: the growth of a distributed system. Michael D. Schroeder, Andrew D. Birrell, Roger M. Needham.  ACM SOSP 1983, reprinted in ACM TOCS 2:1 1984.  Modern work that takes this much further: Managing update conflicts in Bayou, a weakly connected replicated storage system.  D. B. Terry, et. al.  ACM SOSP, December 1995.  We'll focus on the whole idea of convergence in this style but will view this as a rather practical (not mostly theoretical) approach, hence the two practical papers.
Mar. 9
Dynamo: Amazon's highly available key-value store. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.   In Proceedings of the 21st ACM SOSP (Stevenson, Washington, October 2007).  We may also discuss Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining.  Robbert van Renesse, Kenneth Birman and Werner Vogels.  ACM Transactions on Computer Systems, May 2003, Vol.21, No. 2, pp 164-206.
Mar. 14.

Making Distributed Systems Robust.  Chi Ho, Robbert Van Renesse, Danny Dolev.  Opodis 2007.  Guest lecturer Robbert van Renesse.  Please brush up on the Byzantine Agreement Model by reading the Wikipedia article on it before this class if you have forgotten the model or haven't seen it previously.

Mar. 16.
Nysiad: Practical Protocol Transformation to Tolerate Byzantine Failures.  Chi Ho, Robbert Van Renesse, Mark Bickford, Danny Dolev.  NSDI 2008. Guest lecturer Robbert van Renesse.
Spring break No classes March 21, 23. Have fun!
Mar. 28. Thoughts on roles of formal models in real systems.  No paper assigned.
Mar. 30. Convergent consistency in Kelips and Bimodal Multicast.  Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead.  Indranil Gupta, Ken Birman, Prakash Linga, Al Demers and Robbert van Renesse.  2nd International Workshop on Peer-to-Peer Systems (IPTPS '03); February 20-21, 2003.  Claremont Hotel, Berkeley, CA, USA.   Bimodal Multicast.  Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky.  ACM Transactions on Computer Systems, Vol. 17, No. 2, pp 41-88, May, 1999. 
Apr. 4 Hussam Abu-Libdeh, presenting his recent work on building scalable consistent applications
Apr. 6 Guest lecturer: Ymir Vigfusson
Apr. 11

Introduction to transactional languages and transactional memory, STMs.  Distributed Programming in Argus.  Barbara Liskov, CACM Volume 31 Issue 3, March 1988

Apr. 13 Memory consistency and event ordering in scalable shared-memory multiprocessors.Kourosh Gharachorloo et. al.   Proc ACM SIGARCH Volume 18 Issue 3a, June 1990.
Apr. 18 On Interprocess Communication--Part I: Basic Formalism, Part II: Algorithms.  Leslie Lamport. Distributed Computing 1, 2 (1986), 77-101.  Also appeared as SRC Research Report 8.
1 or 2 page essay due April 27 to give me time to read them. Discussion in class on May 4 Our final essay topic.  We noticed in class that the notion of soft and hard state as defined by Brewer in the CAP lecture (Feb 14) and then used in the BASE papers (Mar 2) was really informal compared to the style of formalism used by Gharachorloo in the memory consistency work we looked at on April 13, or the formalism Leslie Lamport proposes in his interprocess communication paper from April 18.  So here's your impossible mission: propose a definition of soft state and hard state, using one of those formalisms if possible, such that: (1) soft-state and hard-state are distinct (e.g. from the definition I can see that they are non-intersecting definitions), (2) the definitions add up to "all state" (e.g. every kind of state falls into one or the other case), (3) the definitions are non-trivial (e.g. not everything is hard state).  If you can't be formal, do this in a hand-waving manner.  We'll discuss what you came up with on May 4.  If it can't be done, write a page or 2 explaining why.
Apr. 20

Wait-free synchronization. Maurice Herlihy.  ACM TOPLAS, 13(1):124-149, 1991.

Apr. 25  Why STM can be more than a research toy. Aleksandar Dragojevic, Pascal Felber, Vincent Gramoli, and Rachid Guerraoui. Comm. ACM 54, 4 (April 2011), 70-77. Also, check out the huge collection of paper at http://www.cs.wisc.edu/trans-memory/biblio/index.html
Apr. 27 Marcos Vaz Salles, presenting: Consistency Rationing in the Cloud: Pay only when it matters.  

Tim Kraska, Martin Hentschel, Gustavo Alonso, Donald Kossmann.  VLDB '09.

May 2 Transactional memory: architectural support for lock-free data structures. Maurice Herlihy and J. Eliot B. Moss.  In Proceedings of the 20th annual international symposium on computer architecture (ISCA '93).  Slide sets: Set-I Set-IIII Set-III
May 4

Overcoming the “D” in CAP: Using Isis2 To Build Locally Responsive Cloud Services  Ken Birman, Qi Huang, Dan Freedman. Submitted to ACM SOCC 2011.



Notions of Consistency in Cloud Computing Settings
   
Chandy: Consistent Snapshots, Gray: Transactions, Wing: Linearizability
    Lamport: Paxos (Schneider: State machine replication)
    Birman: virtual synchrony
    Malkhi: dynamically reconfigurable service model
    Lamport: Byzantine State Machines and Castro: Byzantine Data Replication
    Demers: Gossip protocols and convergent consistency

CAP Principle and Theorem, other "Speed of light" Limits Worth Knowing About
   
Miller's CAP Principle
    Gilbert: The CAP Theorem
    Stonebraker: CAP Reconsidered
    Various: LADIS Presentations on Data Center Consistency
    Skeen: Impossibility of non-blocking 2 phase commit, 3 phase commit
    FLP: Impossibility of Distributed Consensus with One Faulty Process
    Chandra: Weakest Failure Detector to Achieve Consensus

Beyond CAP
   
Gray: Dangers of Database Replication and a Solution
    Pu: Epsilon Serializability
    Agarwal: Checkpoint Serializability
    Raghu: PNUTS

What Next?
   
Reed: Zookeeper
    Vigfusson: Dr. Multicast and Birman: Isis2   
    Language embeddings: Liskov: Argus, Sagas, C# 5.0
    Software and hardware transactional memories