CS 614: Advanced Course in Computer Systems

CS 614: Advanced Course in Computer Systems

Spring 2002 (Ken Birman)

Ken's Office Hours: T, TR after class (10-11:30) in Upson 4119B

Prakash Linga is also available to help: Upson 4161

CS 614 is a graduate level course in computing systems, with a strong focus on operating systems and distributed computing. The course is aimed at PhD students seeking broad background in the areas of importance to modern systems researchers, or having an interest in possible research topics. The course involves a considerable amount of reading, and all students who take the class are expected to read all the papers, for the experience to be satisfactory.

It is hoped that the class will be small, in which case we’ll share the teaching load. Students in the class will be asked to present one or more of the lectures – I’ll do some too – in such a manner as to ensure that all the students do at least one or two presentations. Each lecture will consist of a formal presentation, normally covering several papers, and is expected to go into considerable depth by focusing on some aspect of the material and treating it thoroughly. We’ll be using powerpoint slides (we caÐÏà¡±án make the slides available on the web if anyone wants them). A presentation will normally last perhaps 30 to 45 minutes. The lecture period will be followed by a broader discussion that I’ll lead. All students in the class are expected to participate.

In lieu of homework assignments, students in the class will be asked to do some sort of operating systems research project. You can select a topic of interest to you; some topics may be more theoretical, but most should be fairly practical. At the end of the semester, you’ll write up the results of your work in the form of a short paper conforming to the format and submission rules used by the Symposium on Operating Systems Principles, which is the major conference in this area. Many of the papers we’ll be reading were originally published in this conference. Tips on preparing a presentation appear at the end of this document.

The topics we’ll cover span a range of hot areas within modern systems, although avoiding database research (we do touch on transactional computing models as used in systems, but we won’t do this in the context of databases). In particular, we’ll study some of the major results in the theory of distributed computing (covering topics such as Byzantine Agreement, asynchronous consensus, time and clocks and ordering, consistent cuts and probabilistic protocols). We’ll look at systems issues relating to high performance communication, including various approaches to reducing message latency and improving response times. We’ll study the problem of replication and consistency in distributed systems, looking at virtual synchrony, paxos, and some of the other major systems and models for the area, and we’ll look at the way that operating systems are typically structured to accommodate protocols. We’ll then shift attention to language-level issues including language embeddings of transactions and replication, mobility and language-based security, and object-oriented computing architectures.

In each of these areas, I’ll identify two to four published papers, mostly from the ACM online archives or from readily accessible journals and conferences. I’m hoping to avoid using real paper, so as much as possible we’ll work from URL’s and scanned documents. A typical lecture will be drawn out of one or more papers, but you really are expected to read all of the relevant papers (and even some of the background papers we don’t include), so the real challenge is to distill a tremendous amount of information into a pithy presentation of the right length and depth. During the discussion following each presentation, we’ll try to understand the broader context and importance of each result.

The papers we’ll be reading are dense, technically meaty papers and the issues they raise are very broad. Expect to be challenged both by the amount of reading and by the difficulty of synthesis – placing work into a broader context that makes sense.

There is no point in taking cs614 if you lack a solid background in the basics. Accordingly, we’ll assume that the students in the class have all taken (and mastered) the material covered in a conventional operating systems class, an architecture class, and an undergraduate database class – cs314, cs414 and cs432.


Who	Date	Topic		Readings
		Organizational meeting
Ken	1/24/02	Byzantine Agreement [kpb1]	Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1):26-39, 1986. Note: The link leads to a different (longer) paper with some of the same content in the "middle", Section 2.2, but also much more on other topics. The journal version is in the Engineering Library but seems not to be online. If you read this online copy, focus on Section 2.2 and just skim the material before that. You can skip the material after the consensus topic.
			M. Rabin. Randomized Byzantine Generals. Proceedings 23^rd FOCS (1983), 403-409.
			Leslie Lamport, Robert Shostak and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4(3):382-401, July 1982
			Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant broadcast and related problems. Dept. of Computer Science TR 94-1425, Cornell University (May 1994).
Ken	1/29/02	Consensus[kpb2]
			Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April 1985
			The weakest failure detector for solving consensus; Tushar Deepak Chandra, Vassos Hadzilacos and Sam Toueg; J. ACM 43, 4 (Jul. 1996), Pages 685 - 722
Andre	1/31/02	Time, Clocks[kpb3] Vector time Andre's slides. Ken's slides.	Time, Clocks and the Ordering of Events in a Distributed System. Leslie Lamport, Communications of the ACM 21:7 (July 1978), 558-565.
Ken	2/5/02	More on Time[kpb4]	Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. Leslie Lamport. ACM Transactions on Programming Languages and Systems 6:2 (April 1984), 254-280.
Ken	2/7/02	Clock synchronization [kpb5]	Optimal clock synchronization; T. K. Srikanth and Sam Toueg; J. ACM 34, 3 (Jul. 1987), Pages 626 - 645
			Barbara Simons, Jennifer Welsh, Nancy Lynch. An Overview of Clock Synchronization. Springer-Verlag LNCS 448, 84-96, 1990.
			Fred B. Schneider. Understanding protocols for Byzantine Clock Synchronization. Dept. of Computer Science TR 87-859, Cornell University (August 1987).
			Probabilistic Internal Clock Synchronization. Flaviu Cristian and Cristof Fetzer.
			Lower bounds for convergence function based clock synchronization; Christof Fetzer and Flaviu Cristian; Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , 1995, Pages 137 - 143
Prakash	2/12/02	Consistent Cuts[kpb7] Old slide set from 2001	Distributed snapshots: determining global states of distributed systems; K. Mani Chandy and Leslie Lamport; ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), Pages 63 - 75
Andre	2/14/02	Transactions[kpb8] Andre's slides (pdf) Old slide set from 2001	Distributed programming in Argus; Barbara Liskov; Commun. ACM 31, 3 (Mar. 1988), Pages 300 - 312
			Implementation of Argus; B. Liskov, D. Curtis, P. Johnson and R. Scheifer; Proceedings of the Eleventh ACM Symposium on Operating systems principles , 1987, Pages 111 - 122
			Experience with transactions in quickSilver; Frank Schmuck and Jim Wylie; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 239 - 253
			Lightweight recoverable virtual memory; M. Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere and James J. Kistler; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 146 - 160
Ken	2/19/02	Group Communication[kpb9]
			Distributed process groups in the V Kernel; David R. Cheriton and Willy Zwaenepoel; ACM Trans. Comput. Syst. 3, 2 (May. 1985), Pages 77 - 107
			The process group approach to reliable distributed computing; Kenneth P. Birman; Commun. ACM 36, 12 (Dec. 1993), Pages 37 - 53
			Reliable communication in the presence of failures; Kenneth P. Birman and Thomas A. Joseph; ACM Trans. Comput. Syst. 5, 1 (Feb. 1987), Pages 47 - 76
			Lightweight causal and atomic group multicast; André Schiper, Kenneth Birman and Pat Stephenson; ACM Trans. Comput. Syst. 9, 3 (Aug. 1991), Pages 272 - 314
			Horus a flexible group communication system; Robbert van Renesse, Kenneth P. Birman and Silvano Maffeis; Commun. ACM 39, 4 (Apr. 1996), Pages 76 - 83
			Building reliable, high-performance communication systems from components; Xiaoming Liu, Christoph Kreitz, Robbert van Renesse, Jason Hickey, Mark Hayden, Kenneth Birman and Robert Constable; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 80 - 92
Bill	2/21/02	State machines[kpb10]	Implementing fault-tolerant services using the state machine approach: a tutorial; Fred B. Schneider; ACM Comput. Surv. 22, 4 (Dec. 1990), Pages 299 - 319
			The part-time parliament; Leslie Lamport; ACM Trans. Comput. Syst. 16, 2 (May. 1998), Pages 133 - 169
			Revisiting the Paxos Algorithm R. De Prisco, Butler Lampson, Nancy Lynch. Workshop on Distributed Algorithms (WDAG'97, Saarbrucken, Germany, September 1997), volume 1320 of Lecture Notes in Computer Science, pages 111-125. Springer-Verlag 1997.
no class	2/26/02	Group Membership[kpb11]	Group Communication Specifications: A Comprehensive Study. Roman Vitenberg, Idit Keidar, Gregory V. Chockler and Danny Dolev. Accepted for publication, Computing Surveys.
no class	2/26/02	Group Membership[kpb11]	On the impossibility of group membership; Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg and Bernadette Charron-Bost; Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , 1996, Pages 322 - 330
Jed	2/28/02	Multicast protocols[kpb12]	The Design of the Transis System. Danny Dolev and Dahlia Malkhi. Technical report, Hebrew University (Jerusalem), 1995
Jed	2/28/02	Multicast protocols[kpb12]	Totally Ordered Broadcast in the Face of Network Partitions: Exploiting Group Communication for Replication in Partitionable Networks. Idit Keidar and Danny Dolev
Prakash	3/5/02	Network multicast[kpb13]	Multicast routing in datagram internetworks and extended LANs; Stephen E. Deering and David R. Cheriton; ACM Trans. Comput. Syst. 8, 2 (May. 1990), Pages 85 - 110
			Scalable Reliable Multicast (SRM) and Application Level Framing (ALF) (several papers)
			Lin, John C. and Paul, Sanjoy, RMTP: A Reliable Multicast Transport Protocol, IEEE INFOCOM '96. Also available in from a UK/European mirror.
			PGM: Pretty Good Multicast (PGM) Transport Protocol Specification Jim Speaksman.
Ashish	3/7/02	Probabilistic approaches[kpb14]	Epidemic algorithms for replicated database maintenance; Alan Demers, Dan Greene, Carl Hauser, Wes Irish and John Larson; Proceedings of the Sixth Annual ACM Symposium on Principles of distributed computing , 1987, Pages 1 - 12
			Bimodal multicast; Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky; ACM Trans. Comput. Syst. 17, 2 (May. 1999), Pages 41 - 88
			Managing update conflicts in Bayou, a weakly connected replicated storage system; D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer and C. H. Hauser; Proceedings of the fifteenth ACM symposium on Operating systems principles 1995, 172 – 182.
			Flexible update propagation for weakly consistent replication; Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 288 - 301
Howard	3/12/02	Fast Communication[kpb15]	Performance of the Firefly RPC; Michael D. Schroeder and Michael Burrows; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 1 - 17
			Lightweight remote procedure call; Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 37 - 55
			Scheduler activations: effective kernel support for the user-level management of parallelism; Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 53 - 79
Joel	3/14/02	Novel Communications Architectures [kpb16]	Active messages: a mechanism for integrating communication and computation; Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Erik Schauser; 25 years of the international symposia on Computer architecture (selected papers) , 1998, Pages 430 - 440
Joel	3/14/02	Novel Communications Architectures [kpb16]	U-Net: a user-level network interface for parallel and distributed computing (includes URL); T. von Eicken, A. Basu, V. Buch and W. Vogels; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 40 - 53
	3/19/02	No class: spring break
	3/21/02	No class: spring break
Greg	3/26/02	Modularity and costs [kpb17]	N. C. Hutchinson and L. L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Transactions on Software Engineering, 17(1):64#76, Jan. 1991.
			Protocol implementation using integrated layer processing; Torsten Braun and Christophe Diot; Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication , 1995, Pages 151 - 161
			Masking the overhead of protocol layering; Robbert van Renesse; Conference proceedings on Applications, technologies, architectures, and protocols for computer communications , 1996, Pages 96 - 104
Michael	3/28/02	Toolkits for building Mobile systems	Rover: a toolkit for mobile information access, A. D.Joseph, A. F.de Lespinasse, J. A.Tauber, D. K.Gifford, and M. F.Kaashoek, Proceedings of the fifteenth ACM symposium on Operating systems principles, December 3 -6, 1995, Copper Mountain, CO USA, Pages 156-171
			Mobile Computing with the Rover Toolkit, Anthony D. Joseph, Joshua A. Tauber, and M. Frans Kaashoek, IEEE Transactions on Computers: Special issue on Mobile Computing, 46(3), March 1997
			Isolation-Only Transactions for Mobile Computing, Lu, Q., Satyanarayanan, M. Operating Systems Review, Apr. 1994, Vol. 28, No. 2, pp. 81-87
Bill	4/2/02	Tuple spaces [kpb19]	Linda in context; Nicholas Carriero and David Gelernter; Commun. ACM 32, 4 (Apr. 1989), Pages 444 - 458
			Generative communication in Linda; David Gelernter; ACM Trans. Program. Lang. Syst. 7, 1 (Jan. 1985), Pages 80 - 112
			The Jini architecture for network-centric computing; Jim Waldo; Commun. ACM 42, 7 (Jul. 1999), Pages 76 - 82
			JavaSpacesTechnology , Sun Microsystems Inc.
Uri	4/4/02	Macro vs Microkernels,[kpb20] Extensible Kernels[kpb21]	Rick Rashid. The Mach Microkernel
			Experiences with the Amoeba distributed operating system; Andrew S. Tanenbaum, Robbert van Renesse, Hans van Staveren, Gregory J. Sharp and Sape J. Mullender; Commun. ACM 33, 12 (Dec. 1990), Pages 46 - 63
			The Chorus Microkernel
			Extensibility safety and performance in the SPIN operating system; B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers and S. Eggers; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 267 - 283
			Exokernel: an operating system architecture for application-level resource management; D. R. Engler, M. F. Kaashoek and J. O'Toole; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 251 - 266
			Application performance and flexibility on Exokernel systems; M. Frans Kaashioek, Dawson R. Engler, Gregory R. Ganger, He´ctor M. Bricen˜o, Russell Hunt, David Mazier&grave;res, Thomas Pinckney, Robert Grimm, John Jannotti and Kenneth Mackenzie; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 52 - 65

Serge	4/9/02	Astrolabe	Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Robbert van Renesse, and Kenneth Birman. Submitted to ACM TOCS, November 2001
			Scalable Management and Data Mining Using Astrolabe. Robbert van Renesse, Ken Birman. Submitted to the First International Workshop on Peer-to-Peer Systems (IPTPS 2002).
			Scalable Data Fusion Using Astrolabe. Robbert van Renesse, Ken Birman. Submitted to the Fifth International Conference on Information Fusion 2002 (IF 2002). December 2001.
Jed	4/11/02	Scalable clusters [kpb22]	Vogels, W., Dumitriu, D., Birman, K. Gamache, R., Short, R., Vert, J., Massa, M., Barrera, J., and Gray, J., "The Design and Architecture of the Microsoft Cluster Service -- A Practical Approach to High-Availability and Scalability", Proceedings of the 28^th symposium on Fault-Tolerant Computing, Munich, Germany, June 1998
Jed	4/11/02	Scalable clusters [kpb22]	Frangipani a scalable distributed file system; Chandramohan A. Thekkath, Timothy Mann and Edward K. Lee; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 224 - 237
Greg	4/16/02	Scalable Applications and Real-Time Response[kpb23]	Using Group Communication Technology to Implement a Reliable and Scalable Distributed IN Coprocessor. Roy Friedman and Ken Birman *Document ID* ncstrl.cornell/TR96-1605
Greg	4/16/02	Scalable Applications and Real-Time Response[kpb23]	Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service; Yasushi Saito, Brian N. Bershad and Henry M. Levy; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 1 - 15
Michael	4/18/02	File Systems: Numbers[kpb24]	Measurements of a distributed file system; Mary G. Baker, John H. Hartman, Michael D. Kupfer, Ken W. Shirriff and John K. Ousterhout; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 198 - 212
Michael	4/18/02	File Systems: Numbers[kpb24]	File system usage in Windows NT 4.0; Werner Vogels; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 93 - 109
Joel	4/23/02	Caching[kpb25]	Integrating security in a large distributed system; M. Satyanarayanan; ACM Trans. Comput. Syst. 7, 3 (Aug. 1989), Pages 247 - 280
			Scale and performance in a distributed file system; John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham and Michael J. West; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 51 - 81
			Caching in the Sprite network file system; Michael N. Nelson, Brent B. Welch and John K. Ousterhout; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 134 - 154
Howard	4/25/02	File Systems: Designs[kpb26]	The design and implementation of a log-structured file system; Mendel Rosenblum and John K. Ousterhout; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 26 - 52
Howard	4/25/02	File Systems: Designs[kpb26]	The Zebra striped network file system; John H. Hartman and John K. Ousterhout; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 29 -
Serge	4/30/02	Peer-to-Peer File Systems[kpb27]	A Distributed Decentralized Information Storage and Retrieval System. Ian Clark. University of Edinburgh, 1999 (Original unpublished technical report on Freenet)
			Storage Management and Caching in PAST, Ant Rowston and Peter Druschel; Proc. 21st ACM Symposium on Operating Systems Principles, 2001
			Wide-Area Cooperative Storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris (MIT) Ion Stoica (UC Berkeley)
Uri	5/2/02	Nomadic and mobile computing[kpb28]	Managing update conflicts in Bayou, a weakly connected replicated storage system; D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer and C. H. Hauser; Proceedings of the fifteenth ACM symposium on Operating systems principles 1995, 172 – 182.
			Flexible update propagation for weakly consistent replication; Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 288 - 301
			Disconnected operation in the Coda File System; James J. Kistler and M. Satyanarayanan; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 3 - 25

How to prepare a cs614 presentation

Each time we meet, we’ll start with a fairly short presentation by some student (or by Ken), lasting perhaps 45 minutes, after which there will be a discussion lasting an additional half hour or so. Preparing the lecture is a significant challenge and you should expect to spend as much as a week or two working on it before the date of your presentation.

For each date, there are several suggested references. Any one of these papers could be the subject of a full-length lecture in itself, and several would take multiple lectures to cover in any depth. Accordingly, you face a tough choice: either learn to speak at an extremely rapid rate, which will be completely incomprehensible, or select within the material, focusing on a core aspect or idea and leaving the surrounding material for others to either absorb simply by virtue of having done the readings, or to raise during the discussion.

Your choice of material and approach to the presentation should satisfy several properties:

1) Within 45 minutes you only have time to convey a very small number of technical points. For example, you probably have time to state a single theorem and to present its proof, but even if the paper contains many theorems, unless they are very simple you won’t have time to talk about more than one (perhaps two).

2) Every talk needs to be placed in context. You need to plan on starting by spending at least a few minutes (10 would be a good estimate) explaining what the problem attacked by the authors was, why this was an important problem (what alternative ideas were being advanced at the time?) and what prior work people need to be aware of. You should say enough about this prior work to place the results you’ll focus on into a broader setting.

3) You should tell us what you’ll be telling us, but in a non-technical way. For example, if you are presenting the “consistent cuts” paper, give some intuition into what a consistent cut is all about before trying to define the idea formally. As you move into the formal material, remind us of this intuition and show us how the formal definition captures the intuitive one. Your job is to make the formalism seem natural. If you are talking about a systems result, try and talk about the thing the authors are doing from a high level – why did this excite them, and the community? What was the key question they were interested by, and what was their “angle”?

4) Now, perhaps 10 to 15 minutes into your “slot” you’ll get to the more detailed material. This is where you might briefly explain a formalism and present a theorem and its proof, or lay out the major architectural pieces of a system and explain how they were put together.

5) In the second half of the technical part of your talk, you might illustrate the way that a proof works using a small example, or present the experimental data associated with a systems paper. Systems people tend to be nearly obsessive about experimental data. It should be presented lovingly with attention to the insights that can only be gained by really building something and putting it through its paces. Where are the fundamental costs in an approach? How well does it scale? Are their little warts and hidden problems that the data reveals?

6) In the final part of your talk, you should wrap things together by reminding us what the original premise of the paper you presented was, how the authors attacked the problem, and what the data revealed about the limitations and successes of their approach. Express your opinions at this stage. For a theory talk, try and put the results back into the broader context. Do they say something broad and powerful, or is the main insight narrow?

Your talk will need to be on Powerpoint slides (or some similar system, if you prefer something else). Powerpoint makes suggestions about font sizes but in fact you can certainly pack more on a slide than is possible using the intial choices. Keep in mind that your audience has to be able to read the slides when projected on a screen at a sensible size! Either make plastic slides, or have your material on a laptop that can connect to the overhead display system.

PLEASE DON’T MAKE US WAIT 10 MINUTES WHILE YOU FIDDLE WITH CONNECTIONS AT THE START OF CLASS. COME EARLY AND MAKE SURE YOUR DISPLAY SETUP WILL WORK. DO IT THE DAY BEFORE IF POSSIBLE.

People who have never given formal talks before often make one of two kinds of mistakes: they estimate their own pace incorrectly, or they simply make far too many slides. Generally, figure that you can cover one slide with text on it per minute. About half of your material should be graphics or illustrations of some sort – people get bored with endless text.

So, for a 45 minute talk, you would probably make up something like 30 to 35 slides of actual text and an addition 10 to 15 that are mostly pictures you’ll explain during the talk. Often you can copy some of this material right out of the online versions of papers or from the web site of the author. However, we probably won’t have a network connection in the lecture room, so your slides do need to be self contained.

A fast speaker who is comfortable in front of a group and familiar with her slides could perhaps use as many as 60 slides for a 45 minute talk. But she would need to have relatively little text on each slide to get away with this.

Feel free to stray from the papers I’ve suggested. But do read the papers I suggested. It may be wise to start reading them two weeks early, since many are tough, high-content papers with a lot of meat to them. People attending the class really should read them too, but probably will do so the day before your talk and won’t have time to really pick away at the material and think hard about it. So your job is to take them to a deeper level of understanding than they will have gotten in reading the papers quickly over a few beers. And their job is to have read the papers carefully enough to contribute to a spirited discussion in class.

CS614 Projects

Each student is expected to undertake a project as part of cs614. A typical project would result in a short paper of the sort we’ll be reading – think in terms of the format and length associated with papers that appear in the top ACM SIGOPS conference, Symposium on Operating Systems Principles. These papers typically take on some question or new idea and present it in a systematic manner, including material appropriate to evaluating the idea. Your idea might be about operating systems architecture, protocols, network-level mechanisms (we won’t have time to cover these in cs614, unfortunately, but they are “fair game” for your project), etc. If your work is more formal, a treatment focused on a theoretical topic would be fine, in which case the better conference to look at for ideas and format is ACM Principles of Distributed Computing (PODC). For practical work, you might undertake an implementation, use simulation, or evaluate existing systems in an aggressive way.

The hope is that your CS614 project will be of a nature that could be publishable and that could lead to further research.

[kpb1]This is a big – huge, really – topic for the theory community. I picked a few representative papers but there are probably 200 or 300 papers, literally, in the area. The first paper uses elegant geometrical symmetry arguments to demonstrate the various Byzantine impossibility results. The second one is a ground-breaking paper that overcomes the basic Byzantine limits using a probabilistic algorithm (a theme to which we’ll return later in the course, because it represents the most important emerging area in scalable distributed computing – although the Byzantine model is not really one we use very much). The last papers include the original one introducing the problem and a survey by Hadzilacos and Toueg looking at a variety of solutions, including some of the probabilistic ones that people currently favor.

[kpb2] Whereas the Byzantine work was mostly done in the synchronous model, consensus is posed in a purely asynchronous one. The general view is that impossibility results for the asynchronous model hold in the real world, although this is debatable (and we should debate it). The FLP theorem is quite hard to understand – we’ll need to look at it carefully to gain intuition into the way it works. The Chandra/Toueg result is the best known for the area, but depends on a type of failure detector that, in practice, can only be approximated.

[kpb3] This paper introduces the idea of using “logical” time rather than physical time when designing distributed programs. We’ll be using this concept (indirectly) in many ways throughout the course, so we’ll spend a whole lecture on it.

[kpb4] To finish off the the whole issue of time, we’ll look at a more subtle paper that combines logical and real time. This paper basically introduces the "state machine" approach to fault-tolerance, although it also does some other things with time.

[kpb5] Overall, clock synchronization is a natural topic to consider in light of the preceding papers. The lecture should probably focus on the surveys, although I’ve included two very good synchronization algorithms.

[kpb7] The idea of a consistent cut, and their applications, is a natural outgrowth of the idea of logical cuts,

[kpb8] Transactions were the hot topic in distributed computing during the early 1980’s. But over time the luster faded and by the time of Schmuck’s paper on Quicksilver, only a few pieces of the model were being retained. There are too many papers for this one meeting so we’ll need to focus on some sub-topics.

[kpb9] Group communication emerged for two reasons: as a natural way to group server programs into services but also (simultaneously) as a way to take apart the transactional model into a form of transaction on a set of processes with semantics, but without the negatives of the transactional model itself (which by this time was seen as costly, slow and restrictive).

[kpb10] By now, we’re back to theory: the style of replication supported by groups is called “state machine” replication. Fred wrote a great tutorial on this. Meanwhile, Lamport developed his own approach to implementing state machine replication in groups. A puzzle: how does this relate to what we just discussed in the previous lecture? (Hint: there is a paper by Dolev, Keidar and, I think, Chockler on precisely this question).

[kpb11] We’ll look at the problem of membership agreement. For a while, people thought that membership protocols evaded consensus. By now we know better and hence know that group membership is “impossible” in a purely asynchronous model. Question: is group membership possible in a practical model? By the way, there is a flaw in Aleta’

[kpb12] A sample of some of the other work on multicast. Like transactions there are dozens of systems and papers in this area.

[kpb13] Meanwhile, the network community was inventing the same idea, but with a focus on scalability and a weak notion of reliability. When presenting this paper, it might be useful to pull up a recent technical report by Oznur Ozkasap, me and Zhen Xiao giving some overhead measurements for SRM. Network multicast remains a very hot research area. PGM is one focus of activity, and RMTP-II is a second one, but both are proprietary. SRM has issues but is free and hence widely used.

[kpb14] Our recent focus here at Cornell mixes scalable multicast with some ideas about gossip that were first studied in depth by Demers while he was at Xerox Parc.

[kpb15] Back to the systems community. Their focus, as one might have guessed, is on performance... at the microsecond and cache-stall level. These papers are mostly focused on scheduling issues.

[kpb16] Von Eicken’s work was revolutionary – he tossed out the whole traditional stack to get higher speed, and it worked too! Microsoft ships this approach as its new “winsock” architecture for NT 2000.

[kpb17] A tough call for me – I was tempted to go to my own paper, “The Next Generation Internet: Unsafe at Any Speed?” But although I’m fond of my own papers, I decided we should pursue the community which has spent the last few years struggling with perspectives involving modularity.

[kpb18] Ben proposed to focus on nomadic and mobile computing, so these references are changed from the original handout.

[kpb19] Another cute idea. Core of JNI, Sun’s new architecture for Java.

[kpb20] An endless debate for the OS community. Seems to have ended inconclusively.

[kpb21] The OS answer to Java?

[kpb22] Over the past 3 years, a major research focus. But waiting for some sort of breakthrough. The work seems sort of dull.

[kpb23] A glimpse of some past Cornell work on this topic, side by side with an “award winning” paper from 1999.

[kpb24] Everyone needs to know their basic numbers....

[kpb25] Historically, a huge topic for the OS community

[kpb26] The interest in performance led to these novel architectures. There are others but these, and RAID, were the main ones. RAID could have been a good topic for us, but we are short on lecture slots!

[kpb27] People played with modularity in the file system, but it seems to have become a dead end.

[kpb28] Nomadic and mobile computing: the next frontier. So far, much of the work focuses on the usual paradigm and tries to emulate it. Time for something completely different? I considered doing some cache consistency papers at this point but we ran out of time...

Organizational meeting

Byzantine Agreement[kpb1]

Jim Speaksman.

Novel Communications Architectures[kpb16]

No class: spring break

Modularity and costs[kpb17]

Tuple spaces[kpb19]

Scalable clusters[kpb22]

Byzantine Agreement [kpb1]

Novel Communications Architectures [kpb16]

Modularity and costs [kpb17]

Tuple spaces [kpb19]

Scalable clusters [kpb22]