CS5414 (Fall 2012)
Distributed Computing Principles:
Topics and Readings

Here is a high-level listing of the topics we will try to cover this semester. Readings are noted for each topic.


[A11] James Aspnes. Notes on Theory of Distributed Systems. On-line course notes for Yale University, CS465/565, Fall 2011.

[BM93] Ozalp Babaogly and Keith Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. Earlier version appeared as Chapter 4 in Distributed Systems, Sape J. Mullender (Ed.), Addison Wesley, 1993.

[B93] Ken Birman. The process group approach to reliable distributed computing. Communications of the ACM 36 12 (December 1993), 37--53.

[BS96] Thomas Bressoud and Fred Schneider. Hypervisor-based fault tolerance. ACM Transactions on Computer Systems 14 1 (February 1996) 80 -- 107.

[BMST93] Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. The primary-backup approach. Chapter 8, Distributed Systems, 2nd Edition (S. Mullender, ed.), Addison Wesley, 1993, 199--215.

[CL85] K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems 3, 1 (February 1985), 63 -- 75.

[EAWJ02] Mootaz Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34, 3 (September 2002), 375--408.

[FGK11] Felix C. Freiling, Rachid Guerraoui, and Petr Kuznetsov. The failure detector abstraction. ACM Computing Surverys 43, 2 (February 2011), 9:1 -- 9:40.

[H96] Maurice Herlihy. A quorum-consensus replication method for abstract data types. ACM Transactions on Computer Systems 4 1 (February 1986), 32 -- 53.

[KS08] Ajay Kshemkalyani and Mukesh Singhal. Distributed Computing. Principles, Algorithms, and Systems. Cambridge University Press, 2008.

[L78] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21, 7 (July 1978), 558 -- 565.

[L89] Leslie Lamport. A Simple Approach to Specifying Concurrent Systems. Communications of the ACM 32, 1 (January 1989), 32 -- 45. Also appeared as SRC Research Report 15.

[LPS82] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4, 3 (July 1982), 382--401.

[MR10] Michael Merideth and Michael Reiter. Selected Results from the Latest Decade of Quorum Systems Research. Chapter 10 in Replication Theory and Practice Lecture Notes in Computer Science, vol. 5959. Springer-Verlag, 2010, 185 -- 206.

[MS85] Stephen Mahaney and Fred Schneider. Inexact agreement: accuracy, precision, and graceful degradation Inexact agreement: accuracy, precision, and graceful degradation. Proceedings of the Fourth Annual ACM Symposium On Principles Of Distributed Computing (Ontario, Canada) 1985, 237 -- 249.

[MS88] Keith Marzullo and Frank Schmuck. Supplying High Availability with a Standard Network File System. Proceedings Eigth International Conference on Distributed Computing Systems IEEE 1988, 447--453.

[S87] Fred B. Schneider. Understanding Protocols for Byzantine Clock Synchronization. Technical Report, Computer Science Department, Cornell University. August 1987.

[S93] Fred B. Schneider. What good are models and what models are good? Chapter 2, Distributed Systems, 2nd Edition (S. Mullender, ed.), Addison Wesley, 1993, 17--25.

[S96] Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22 4 (December 1990), 299--319.

[SGS83] Fred Schneider, David Gries, and Richard Schlichting. Fault-tolerant broadcasts Science of Computer Programming 4, 1 (April 1984), 1--15.

[vRG10] Robbert van Renesse and Rachid Guerraoui. Replication techniques for availability. Chapter 2, Replication Theory and Practice B Charron-Bost, F. Pedone, and Andre Schiper (ed), Lecture Notes in Computer Science vol 5959, Springer Verlag, 2010, 19--40.

[vRS04] R. van Renesse and Fred B. Schneider. Chain Replication for Supporting High Throughput and Availability. Sixth Symposium on Operating Systems Design and Implementation (OSDI '04). USENIX Association, (San Francisco, California, December 2004), 91--104.