Here is a high-level listing of the topics we will try to cover this semester. Readings are noted for each topic.
[CL85] K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems 3, 1 (February 1985), 63 -- 75.
[EAWJ02] Mootaz Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34, 3 (September 2002), 375 -- 408.
[H86] M. Herlihy. A quorum-consensus replication method for abstract data types. ACM Transactions on Computer Systems 4, 1 (February 1986), 32--53.
[KS] Ajay D. Kshemkalyani and Muskesh Singhal. Distributed Computing Principles, Algorithms, and Systems. Cambridge University Press, Cambridge UK, 2008.
[L78] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21, 7 (July 1978), 558 -- 565.
[MR98] D. Malkhi and M. Reiter. Byzantine quorum systems. Distributed Computing 11 4 (1998), 203--213.
[vRS04] R. van Renesse and Fred B. Schneider. Chain Replication for Supporting High Throughput and Availability. Sixth Symposium on Operating Systems Design and Implementation (OSDI '04). USENIX Association, (San Francisco, California, December 2004), 91--104.
[S87] Fred B. Schneider. Understanding protocols for Byzantine clock synchronization. Cornell University Computer Science Technical Report TR 87-859, August 1987.
[S90] Fred B. Schneider Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4 (December 1990), 299--319.