CS 614:
Advanced Course in Computer Systems
Spring 2002 (Ken Birman)
Ken's Office Hours: T, TR after class (10-11:30) in Upson 4119B
Prakash
Linga is also available to help: Upson 4161
CS 614 is a graduate level course in computing
systems, with a strong focus on operating systems and distributed
computing. The course is aimed at PhD
students seeking broad background in the areas of importance to modern systems
researchers, or having an interest in possible research topics. The course involves a considerable amount of
reading, and all students who take the class are expected to read all the papers,
for the experience to be satisfactory.
It is hoped that the class will be small, in which
case we’ll share the teaching load.
Students in the class will be asked to present one or more of the
lectures – I’ll do some too – in such a manner as to ensure that all the
students do at least one or two presentations.
Each lecture will consist of a formal presentation, normally covering
several papers, and is expected to go into considerable depth by focusing on some
aspect of the material and treating it thoroughly. We’ll be using powerpoint slides (we caÐÏࡱán make the slides
available on the web if anyone wants them).
A presentation will normally last perhaps 30 to 45 minutes. The lecture period will be followed by a
broader discussion that I’ll lead. All
students in the class are expected to participate.
In lieu of homework assignments, students in the
class will be asked to do some sort of operating systems
research project. You can select a
topic of interest to you; some topics may be more theoretical, but most should
be fairly practical. At the end of the
semester, you’ll write up the results of your work in the form of a short paper
conforming to the format and submission rules used by the Symposium on Operating
Systems Principles, which is the major conference in this area. Many of the papers we’ll be reading were
originally published in this conference.
Tips on preparing a presentation appear at the end of
this document.
The topics we’ll cover span a range of hot areas
within modern systems, although avoiding database research (we do touch on
transactional computing models as used in systems, but we won’t do this in the
context of databases). In particular,
we’ll study some of the major results in the theory of distributed computing
(covering topics such as Byzantine Agreement, asynchronous consensus, time and
clocks and ordering, consistent cuts and probabilistic protocols). We’ll look at systems issues relating to
high performance communication, including various approaches to reducing message
latency and improving response times.
We’ll study the problem of replication and consistency in distributed
systems, looking at virtual synchrony, paxos, and some of the other major
systems and models for the area, and we’ll look at the way that operating
systems are typically structured to accommodate protocols. We’ll then shift attention to language-level
issues including language embeddings of transactions and replication, mobility
and language-based security, and object-oriented computing architectures.
In each of these areas, I’ll identify two to four
published papers, mostly from the ACM online archives or from readily
accessible journals and conferences.
I’m hoping to avoid using real paper, so as much as possible we’ll work
from URL’s and scanned documents. A
typical lecture will be drawn out of one or more papers, but you really are
expected to read all of the relevant papers (and even some of the background
papers we don’t include), so the real challenge is to distill a tremendous
amount of information into a pithy presentation of the right length and
depth. During the discussion following
each presentation, we’ll try to understand the broader context and importance
of each result.
The papers we’ll be reading are dense, technically
meaty papers and the issues they raise are very broad. Expect to be challenged both by the amount
of reading and by the difficulty of synthesis – placing work into a broader
context that makes sense.
There is no point in taking cs614 if you lack a
solid background in the basics.
Accordingly, we’ll assume that the students in the class have all taken
(and mastered) the material covered in a conventional operating systems class,
an architecture class, and an undergraduate database class – cs314, cs414 and
cs432.
| Who |
Date |
Topic |
Readings |
|
|---|---|---|---|---|
Organizational meeting |
||||
| Ken | 1/24/02 |
Byzantine Agreement[kpb1] |
Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1):26-39, 1986. Note: The link leads to a different (longer) paper with some of the same content in the "middle", Section 2.2, but also much more on other topics. The journal version is in the Engineering Library but seems not to be online. If you read this online copy, focus on Section 2.2 and just skim the material before that. You can skip the material after the consensus topic. |
|
|
M. Rabin. Randomized Byzantine Generals. Proceedings 23rd FOCS (1983), 403-409. |
||||
| Leslie Lamport, Robert Shostak and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4(3):382-401, July 1982 | ||||
|
Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant broadcast and related problems. Dept. of Computer Science TR 94-1425, Cornell University (May 1994). |
||||
| Ken | 1/29/02 |
|
||
|
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April 1985 |
||||
| The weakest failure detector for solving consensus; Tushar Deepak Chandra, Vassos Hadzilacos and Sam Toueg; J. ACM 43, 4 (Jul. 1996), Pages 685 - 722 | ||||
| Andre | 1/31/02 |
Time, Clocks[kpb3] |
Time, Clocks and the Ordering of Events in a Distributed System. Leslie Lamport, Communications of the ACM 21:7 (July 1978), 558-565. |
|
| Ken | 2/5/02 |
|
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. Leslie Lamport. ACM Transactions on Programming Languages and Systems 6:2 (April 1984), 254-280. |
|
| Ken | 2/7/02 |
Optimal clock synchronization; T. K. Srikanth and Sam Toueg; J. ACM 34, 3 (Jul. 1987), Pages 626 - 645 |
||
|
Barbara Simons, Jennifer Welsh, Nancy Lynch. An Overview of Clock Synchronization. Springer-Verlag LNCS 448, 84-96, 1990. |
||||
|
Fred B. Schneider. Understanding protocols for Byzantine Clock Synchronization. Dept. of Computer Science TR 87-859, Cornell University (August 1987). |
||||
| Probabilistic Internal Clock Synchronization. Flaviu Cristian and Cristof Fetzer. | ||||
|
Lower bounds for convergence function based clock synchronization; Christof Fetzer and Flaviu Cristian; Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , 1995, Pages 137 - 143 |
||||
| Prakash | 2/12/02 |
Consistent Cuts[kpb7] |
Distributed snapshots: determining global states of distributed systems; K. Mani Chandy and Leslie Lamport; ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), Pages 63 - 75 |
|
| Andre | 2/14/02 |
Transactions[kpb8]
|
Distributed programming in Argus; Barbara Liskov; Commun. ACM 31, 3 (Mar. 1988), Pages 300 - 312 |
|
|
Implementation of Argus; B. Liskov, D. Curtis, P. Johnson and R. Scheifer; Proceedings of the Eleventh ACM Symposium on Operating systems principles , 1987, Pages 111 - 122
|
||||
|
Experience with transactions in quickSilver; Frank Schmuck and Jim Wylie; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 239 - 253 |
||||
|
Lightweight recoverable virtual memory; M. Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere and James J. Kistler; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 146 - 160 |
||||
| Ken | 2/19/02 | |||
| Distributed process groups in the V Kernel; David R. Cheriton and Willy Zwaenepoel; ACM Trans. Comput. Syst. 3, 2 (May. 1985), Pages 77 - 107 | ||||
|
The process group approach to reliable distributed computing; Kenneth P. Birman; Commun. ACM 36, 12 (Dec. 1993), Pages 37 - 53 |
||||
|
Reliable communication in the presence of failures; Kenneth P. Birman and Thomas A. Joseph; ACM Trans. Comput. Syst. 5, 1 (Feb. 1987), Pages 47 - 76 |
||||
|
Lightweight causal and atomic group multicast; André Schiper, Kenneth Birman and Pat Stephenson; ACM Trans. Comput. Syst. 9, 3 (Aug. 1991), Pages 272 - 314 |
||||
|
Horus a flexible group communication system; Robbert van Renesse, Kenneth P. Birman and Silvano Maffeis; Commun. ACM 39, 4 (Apr. 1996), Pages 76 - 83 |
||||
|
Building reliable, high-performance communication systems from components; Xiaoming Liu, Christoph Kreitz, Robbert van Renesse, Jason Hickey, Mark Hayden, Kenneth Birman and Robert Constable; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 80 - 92 |
||||
| Bill | 2/21/02 |
Implementing fault-tolerant services using the state machine approach: a tutorial; Fred B. Schneider; ACM Comput. Surv. 22, 4 (Dec. 1990), Pages 299 - 319 |
||
|
The part-time parliament; Leslie Lamport; ACM Trans. Comput. Syst. 16, 2 (May. 1998), Pages 133 - 169 |
||||
|
Revisiting the Paxos Algorithm R. De Prisco, Butler Lampson, Nancy Lynch. Workshop on Distributed Algorithms (WDAG'97, Saarbrucken, Germany, September 1997), volume 1320 of Lecture Notes in Computer Science, pages 111-125. Springer-Verlag 1997. |
||||
| no class | 2/26/02 |
Group Communication Specifications: A Comprehensive Study. Roman Vitenberg, Idit Keidar, Gregory V. Chockler and Danny Dolev. Accepted for publication, Computing Surveys. |
||
|
On the impossibility of group membership; Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg and Bernadette Charron-Bost; Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , 1996, Pages 322 - 330 |
||||
| Jed | 2/28/02 |
The Design of the Transis System. Danny Dolev and Dahlia Malkhi. Technical report, Hebrew University (Jerusalem), 1995 |
||
|
Totally Ordered Broadcast in the Face of Network Partitions: Exploiting Group Communication for Replication in Partitionable Networks. Idit Keidar and Danny Dolev |
||||
| Prakash | 3/5/02 |
Multicast routing in datagram internetworks and extended LANs; Stephen E. Deering and David R. Cheriton; ACM Trans. Comput. Syst. 8, 2 (May. 1990), Pages 85 - 110 |
||
|
Scalable Reliable Multicast (SRM) and Application Level Framing (ALF) (several papers) |
||||
|
Lin, John C. and Paul, Sanjoy, RMTP: A Reliable Multicast Transport Protocol, IEEE INFOCOM '96. Also available in from a UK/European mirror. |
||||
|
PGM: Pretty Good Multicast (PGM) Transport Protocol Specification Jim Speaksman.
|
||||
| Ashish | 3/7/02 |
Epidemic algorithms for replicated database maintenance; Alan Demers, Dan Greene, Carl Hauser, Wes Irish and John Larson; Proceedings of the Sixth Annual ACM Symposium on Principles of distributed computing , 1987, Pages 1 - 12 |
||
|
Bimodal multicast; Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky; ACM Trans. Comput. Syst. 17, 2 (May. 1999), Pages 41 - 88 |
||||
|
Managing update conflicts in Bayou, a weakly connected replicated storage system; D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer and C. H. Hauser; Proceedings of the fifteenth ACM symposium on Operating systems principles 1995, 172 – 182. |
||||
|
Flexible update propagation for weakly consistent replication; Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 288 - 301 |
||||
| Howard | 3/12/02 |
Performance of the Firefly RPC; Michael D. Schroeder and Michael Burrows; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 1 - 17 |
||
|
Lightweight remote procedure call; Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 37 - 55 |
||||
|
Scheduler activations: effective kernel support for the user-level management of parallelism; Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 53 - 79 |
||||
| Joel | 3/14/02 |
Novel Communications Architectures[kpb16]
|
Active messages: a mechanism for integrating communication and computation; Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Erik Schauser; 25 years of the international symposia on Computer architecture (selected papers) , 1998, Pages 430 - 440 |
|
|
U-Net: a user-level network interface for parallel and distributed computing (includes URL); T. von Eicken, A. Basu, V. Buch and W. Vogels; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 40 - 53 |
||||
| 3/19/02 |
|
|||
| 3/21/02 | ||||
| Greg | 3/26/02 |
Modularity
and costs[kpb17]
|
N. C. Hutchinson and L. L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Transactions on Software Engineering, 17(1):64#76, Jan. 1991. |
|
|
Protocol implementation using integrated layer processing; Torsten Braun and Christophe Diot; Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication , 1995, Pages 151 - 161 |
||||
|
Masking the overhead of protocol layering; Robbert van Renesse; Conference proceedings on Applications, technologies, architectures, and protocols for computer communications , 1996, Pages 96 - 104 |
||||
| Michael | 3/28/02 |
Rover:
a toolkit for mobile information access, A. D.Joseph, A. F.de Lespinasse,
J. A.Tauber, D. K.Gifford, and M. F.Kaashoek, Proceedings of the fifteenth ACM symposium on Operating systems principles, December 3 -6, 1995, Copper Mountain, CO USA, Pages 156-171 |
||
| Mobile Computing with the Rover Toolkit, Anthony D. Joseph, Joshua A. Tauber, and M. Frans Kaashoek, IEEE Transactions on Computers: Special issue on Mobile Computing, 46(3), March 1997 | ||||
| Isolation-Only Transactions for Mobile Computing, Lu, Q., Satyanarayanan, M. Operating Systems Review, Apr. 1994, Vol. 28, No. 2, pp. 81-87 | ||||
| Bill | 4/2/02 |
Tuple spaces[kpb19]
|
Linda in context; Nicholas Carriero and David Gelernter; Commun. ACM 32, 4 (Apr. 1989), Pages 444 - 458 |
|
|
Generative communication in Linda; David Gelernter; ACM Trans. Program. Lang. Syst. 7, 1 (Jan. 1985), Pages 80 - 112 |
||||
|
The Jini architecture for network-centric computing; Jim Waldo; Commun. ACM 42, 7 (Jul. 1999), Pages 76 - 82 |
||||
|
JavaSpacesTechnology , Sun Microsystems Inc. |
||||
| Uri | 4/4/02 |
|
Rick Rashid. The Mach Microkernel |
|
|
Experiences with the Amoeba distributed operating system; Andrew S. Tanenbaum, Robbert van Renesse, Hans van Staveren, Gregory J. Sharp and Sape J. Mullender; Commun. ACM 33, 12 (Dec. 1990), Pages 46 - 63 |
||||
| Extensibility safety and performance in the SPIN operating system; B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers and S. Eggers; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 267 - 283 | ||||
| Exokernel: an operating system architecture for application-level resource management; D. R. Engler, M. F. Kaashoek and J. O'Toole; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 251 - 266 | ||||
| Application performance and flexibility on Exokernel systems; M. Frans Kaashioek, Dawson R. Engler, Gregory R. Ganger, He´ctor M. Bricen˜o, Russell Hunt, David Mazier`res, Thomas Pinckney, Robert Grimm, John Jannotti and Kenneth Mackenzie; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 52 - 65 | ||||
| Serge | 4/9/02 |
Astrolabe:
A Robust and Scalable Technology for Distributed System Monitoring,
Management, and Data Mining.
Robbert van Renesse, and Kenneth Birman.
Submitted to ACM TOCS, November 2001 |
||
|
Scalable
Management and Data Mining Using Astrolabe.
Robbert van Renesse, Ken Birman. Submitted to the First International
Workshop on Peer-to-Peer Systems (IPTPS 2002). |
||||
|
Scalable
Data Fusion Using Astrolabe.
Robbert van Renesse, Ken Birman. Submitted
to the Fifth International Conference on Information Fusion 2002 (IF 2002).
December 2001. |
||||
| Jed | 4/11/02 |
Scalable clusters[kpb22] |
Vogels, W., Dumitriu, D., Birman, K. Gamache, R., Short, R., Vert, J., Massa, M., Barrera, J., and Gray, J., "The Design and Architecture of the Microsoft Cluster Service -- A Practical Approach to High-Availability and Scalability", Proceedings of the 28th symposium on Fault-Tolerant Computing, Munich, Germany, June 1998 |
|
|
Frangipani a scalable distributed file system; Chandramohan A. Thekkath, Timothy Mann and Edward K. Lee; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 224 - 237 |
||||
| Greg | 4/16/02 |
Using Group Communication Technology to Implement a Reliable and Scalable Distributed IN Coprocessor. Roy Friedman and Ken Birman Document ID ncstrl.cornell/TR96-1605 |
||
|
Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service; Yasushi Saito, Brian N. Bershad and Henry M. Levy; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 1 - 15 |
||||
| Michael | 4/18/02 |
Measurements of a distributed file system; Mary G. Baker, John H. Hartman, Michael D. Kupfer, Ken W. Shirriff and John K. Ousterhout; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 198 - 212 |
||
|
File system usage in Windows NT 4.0; Werner Vogels; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 93 - 109 |
||||
| Joel | 4/23/02 |
Integrating security in a large distributed system; M. Satyanarayanan; ACM Trans. Comput. Syst. 7, 3 (Aug. 1989), Pages 247 - 280 |
||
|
Scale and performance in a distributed file system; John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham and Michael J. West; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 51 - 81 |
||||
|
Caching in the Sprite network file system; Michael N. Nelson, Brent B. Welch and John K. Ousterhout; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 134 - 154 |
||||
| Howard | 4/25/02 |
File Systems: Designs[kpb26] |
The design and implementation of a log-structured file system; Mendel Rosenblum and John K. Ousterhout; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 26 - 52 |
|
|
The Zebra striped network file system; John H. Hartman and John K. Ousterhout; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 29 - |
||||
| Serge | ||||