Slide sets: CS514 Spring 2005

Instructors teaching from Ken's book are welcome to borrow from these slides!

(Under revision; as Ken prepares each lecture, he'll post the slides here)

1: Course overview. Cloud and edge computing. This is an overview lecture, but we'll also drill down on one example of kind of deeper question that the course will look at in more detail, namely the challenge of building a highly available (fault-tolerant) service that can't exhibit "split brain" behavior. (Slides available in two formats: pptx. pdf)

Note for those interested in the slides: pptx is just an XML version of ppt; if you have an old powerpoint version there is a download available from Microsoft that will enable your system to read and write this format, which is the one used in the new Open Office standards

2: Web Services standards. A glimpse of the complexity of the modern Internet through the rather confusing perspective of the major standards that cloud (and other) companies have adopted. The issue here is that there are far more standards than the world can possibly know what to do with, and they overlap heavily. We'll look mostly at Web Services but also briefly at CORBA, .NET and J2EE. (pptx. pdf)

3. Cloud computing infrastructure components. The standards we discussed in Lecture 2 are mostly "superficial" in the sense that they say nothing (or little) about how to actually implement the services in the data center itself. We'll have a look at some of the typical infrastructure components one finds in a modern data center and will talk about some issues that these pose, such as tracking dynamic membership (a topic related to other concepts such as security key certificate delegation). (pptx. pdf).

4. Map Reduce: A powerful higher-level infrastructure tool. The world changed dramatically when companies like Amazon and Google emerged and began to maintain enormous data centers that harness the power of tens of thousands of computers in coordinated ways. We'll look at Map Reduce, a popular and very powerful tool for running computation on a really large scale. In addition to the slides, we have a pointer to recommended reading for this lecture. Slides: pptx. pdf.

5. Logical notions of time in distributed systems. By now, we've glimpsed some of the infrastructure of a modern cloud computing data center: how applications talk to the data center, how the data center is internally organized, some issues associated with replicating services for high availability, and one major example of the kinds of "applications" that are used to create the data those services report to their users (namely, Map Reduce). But how can one develop a mathematical framework for cloud computing? Leslie Lamport's approach was to start from the bottom and to define a model for distributed systems execution that builds on what he calls logical time and can be supported with logical clocks. We'll look at this topic. We'll also look at vector timestamps. These generalize Lamport's logical timestamps in a simple way and, at the cost of more overhead, represent time in a more accurate way. (pptx. pdf)

6. Physical clocks and clock synchronization. Last time we discussed logical notions of time, but what about real time in cloud computing systems? We'll discuss real clocks, and the limits to the extend that they can be synchronized. It becomes clear that time is a useful tool in data centers, but won't have a high enough resolution (accuracy) to be used at the level of individual messages being exchanged at network data rates. In fact, what we'll see that is that the uncertainty in delay associated with message exchange actually limits the accuracy with which clocks can be synchronized! (pptx. pdf)

7. System Management Service (I). Building a completely general service for system management. (pptx. pdf)

8. System Management Service (II). Extending the service into a powerful data replication and fault-tolerance tool. (pptx. pdf)

9. State Machine Replication and Decision Making. Using our new service to make decisions in big systems. The limitations on State Machine styles of replication. Tradeoff: with the system management service, we get flexibility and fine-grained control over the level of ordering and even persistence, but at the cost of complexity. Models like the state machine model aim at a one-size fits all perspective, but this brings a different form of complexity: the simple application model they presume is actually unrealistically oversimplified, and that excessive simplicity turns out to be a source of complexity to a developer (who needs to overcome it when designing solutions, in effect "squeezing" the application down to make it fit the model -- deterministic, etc). A further complication is the question of the fault-model to assume. From this we see that there really isn't any simple story -- all the options turn out to be fairly complex. (pptx. pdf).

10. Putting replication into context: Brief recap of the many models and protocols we're talking about by now. (Short slide set: ppt, pdf). Maelstrom and Ricochet: Two examples of time critical communication protocols for use between and within data centers. These protocols both use a form of XOR-based error correction coding, which is why it makes sense to talk about both at once. The Linux Red Hat community is looking at the Ricochet protocol as a future standard for data center replication and event notification, because of its good timing properties even when failures occur. Available only in pdf. See also the recommended reading.

11. Revisiting group communication: Today we're switching "levels" by talking about the perspective on group communication that the application developer will have when using a replication toolkit. This involves revisiting virtual synchrony but as a library that could be running "outboard" from the GMS (the GMS tracks group membership but doesn't handle the actual multicasts). Using lots of groups (thinking of a group as a kind of object). Toolkit concept from Isis. Other tooks and toolkits: Transis, Totem, Horus, Ensemble, Paxos. Coming next: Virtually synchronous Paxos. (pptx. pdf).

12. Transactional model. In Fall 2008, we had to cut back on transactional material (normally, we do three lectures and spend a long time looking at transactional replication; this year we needed more time for cloud computing topics like Map Reduce). The book has readings on this topic. Today's lecture is more of a broad overview: ACID model, transactional 2PC, locking, "issues". (pptx. pdf).

13. eBay's Scalability Odyssey. From a talk given by Franco Travostino and Randy Shoup (eBay, Inc) at LADIS 2008. pdf.

14. Internet-Scale Efficiency. From a talk given by James Hamilton (Microsoft), at LADIS 2008. pdf.

15. Distributed Hash Tables. Everything you always dreamed of learning about finding things in very large systems. (pptx. pdf).

16. Astrolabe. A hierarchical structure for distributed state aggregation. (pptx. pdf.)

17. Overlay Networks. General overview. VPNs. Isis in the NYSE. RON. P6P. (pptx. pdf.)

18. Gossip used to build overlays. Scribe over Pastry. T-Man. Scribe is a good example of a broad class of similar systems that run some form of overlay over a DHT (van Renesse's FireFlies is another such system, but we don't have time to cover both). I'm guessing that T-Man will be less familiar. This project, at the University of Bologna in Italy, used gossip to construct a wide range of distributed topologies, which would then be used by other applications as "overlay structures". (ppt. pdf) T-Man slides.

19. Pub-sub systems constructed as overlays. Traditional publish-subscribe systems are topic based nad either run as direct multicast platforms or, as we saw with Scribe, using some form of overlay. But a second kind of "content based" publish-subscribe system has become popular in recent years. In these, the subscription is a pattern directly on the message contents. We'll look at two widely cited examples, Gryphon (an older system from IBM, but quite successful), and then Siena (a widely popular WAN solution). Time permitting, we'll also talk about Gehrke's Cayuga project, which treats published messages as database updates, and treats the subscription as a kind of standing query that does upcalls when new matches are found. Cayuga is unusual for being exceptionally scalable. (ppt. pdf).

20. Storage using overlays. In this lecture, we'll look at P2P storage systems that run over DHTs or similar structures. The two classics were both published at SOSP in 2001: CFS (built on Chord) and PAST (built on Pastry). As it turns out the key in such systems is to be robust under stress that might include attack, and the ultimate storage system for this purpose is Maziere's Tangler, so we'll also look at that. (pdf).

21. BitTorrent. BAR Gossip. In this lecture we'll look at a new game-based perspective on P2P systems in which we imagine that the participating computers are actually playing a kind of game. Each wants to maximize its winnings and has little interest in fairness; the game designer wants to achieve some global goal. The trick is to design the system so that doing the smartest selfish thing will turn out to also advance the goals of the system as a whole. (pptx. pdf). The BAR gossip slides were provided by Lorenzo Alvisi and are kind of long; I skimmed them to stay within the 25 minutes we had available. A second slide set wasn't much shorter.

22. Sybil Attacks. In order to attack a P2P system, one common trick is to somehow amplify your limited number of attacking nodes into a seemingly larger set of machines; this gives you a numerical edge over the nodes you are trying to compromise. We'll discuss the broader implications of these kinds of attacks. (pptx. pdf).

23. Failure Detection. Detecting failures is surprisingly hard; we'll talk about why this is the case, and will try to understand what (if anything) can be done about it. (pptx. pdf).

24. Byzantine Agreement. After a long period when Byzantine Agreement was of mostly theoretical interest, computers became so fast that the protocols stopped looking impossibly slow. Since that happened, about a decade ago, much workhas been done on making Byzantine Agreement into a practical tool that cloud platforms could use casually and quite widely. We'll look at this question with a focus on some of the best available protocols. (pptx. pdf).

25. Dr. Multicast: Rx for your Multicast Woes. The cloud computing community struggles with the pros and cons of allowing applications to exploit IPMC within data centers. On the positive side, you get that huge 1-n sender side benefit, and don't need a complex overlay construction. But IPMC is well known to fall apart under stress. Why do so many applications have so much trouble with IPMC, and what can the "MCMD" do to cure the sickness? (ppt. pdf).

26. Live Objects. This (our last) lecture will focus on Cornell's Live Distributed Objects technology, which many class members are using in HW3. In your HW3 project you worked on integrating cloud content with edge content; we'll touch on how this fits with the "big picture". Our technical focus, however, will be on how live objects manage to achieve scalability. (ppt. pdf).