Spring 2018 Syllabus

After each lecture, Professor Birman will link his slides to the associated lecture topic. Readings are either from the Foster/Gannon book (FG), or from Ken's book (KB), or in some cases from papers that appeared in conferences (if so, the paper itself will be listed).

The readings are strongly encouraged, and yet are still optional. They are a good way to learn more, or to gain clarity if you want to go back over a lecture topic that you weren't completely comfortable with from lecture (or if you miss a lecture entirely). To learn the material in CS5412 it would usually be a good idea to at least look at these recommended readings. Just the same, you won't ever be "tested" on something that requires knowing things that were never covered in class, or asked about optional readings when presenting your project (you might be asked about topics covered in class that seem relevant to the project).

Some lectures are really part of a larger group of lectures with multiple readings, covered on multiple class meetings. In those cases the same reading won't be repeated again and again, yet we might refer back to things that the prior lecture and reading explored.

  Date Topic Reading (optional, see note above)
1. Thu 1/25

Cloud computing overview: Edge, Network, Cloud roles.

This is just an introduction, but we'll see a LOT of terminology. The edge is basically the world of web browsers and apps (which often are best viewed as specialized web browsers). The network is more active in modern cloud systems than you might realize; we'll touch on that. And finally, we'll ask what jobs the data center itself plays.

FG chapters 1, 2, 4, 5, 6.

KB chapters 1-5.

Mesos: a platform for fine-grained resource sharing in the data center. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica, In Proceedings of the 8th USENIX conference on Networked systems design and implementation (NSDI'11). USENIX Association, Berkeley, CA, USA, 295-308.

2. Tue 1/30

How Facebook serves image requests so rapidly.

The best way to really understand the scale and complexity of the cloud is to look at something concrete, so we'll drill down on the tricks Facebook uses to instantly serve up feeds to a hundred million simultaneous users. It centers on caching... but at global scale (geocaching, but not in the sense of Pokemon go...)

An Analysis of Facebook Photo Caching. Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 167-181. DOI: https://doi.org/10.1145/2517349.2522722

Finding a Needle in a Haystack: Facebook's photo storage. Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 47-60.

3. Thu 2/1

Tradeoffs in the cloud. CAP Conjecture and first-tier programming.

So, how do those cloud computing systems deal with cache-consistency at global scale? Drum roll.... tada tada tada. They don't. We'll learn about the claims of a tradeoff between consistency, availability and partition tolerance that underly something called Brewer's CAP conjecture. The conjecture is actually false. But weak consistency is still very common in the cloud. On the other hand, is something really "weak" if it works well enough for the intended purpose? This is like the famous tree that falls in the forest. If nobody cares ("nobody is listening"), did that tree make a loud noise? So is consistency weak if nobody checks?

KB Sec. 5.3

FG chapter 2 (Sec 2.2)

A Certain Freedom: Thoughts on the CAP theorem. Eric Brewer. In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing (PODC '10). ACM, New York, NY, USA, 335-335. DOI=http://dx.doi.org/10.1145/1835698.1835701

4. Tue 2/6

Big data analytics: Concepts and major elements.

The most important thing that the cloud does is to place advertising, so we can click it and buy awesome stuff. We knew that. So how do they figure out what you might be interesting in buying? They do "big data" analysis, which involves a lot of parallel computing. We'll touch on some of the main tools.

FG Chapters 8-10

MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. Commun. ACM 51, 1 (January 2008), 107-113. DOI: https://doi.org/10.1145/1327452.1327492

5. Thu 2/8

Intelligence in the cloud: Bayesian networks, neural networks. TAO.

It turns out that machine learning has a very special role and meaning in the cloud. So we'll learn about some famous knowledge modelling approaches, but then we'll hear about another Facebook technology called TAO. Your mission (not totally impossible): figure out how TAO relates to these other concepts. Hint: It actually does have a real connection to one of them.

FG Chapter 10

TAO: Facebook's Distributed Data Store for the Social Graph. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani. 2013 USENIX Annual Technical Conference (USENIX ATC '13).

6. Tue 2/13

What limits performance and latency in cloud systems today?

We care a lot about speed. In fact we'll see that in a famous "alpha/beta" experiment, 100ms delays were enough to drastically reduce actual clicks on advertising or purchases. So how is the cloud getting all that speed? The tricky answer is this: the cloud is fast only because of inconsistent caching! If the cloud tried to be consistent, it would be very slow. Hmm.

How Page Load Time Affects Bounce Rate and Page Views. Roxana Elliott. Section.io, August 10, 2017.
7. Thu 2/15

Everything is programmable. Why this matters.

To help people build ultra-fast applications, vendors who sell hardware to big data centers are making some pretty unexpected things programmable, like the network interface controller, the disks used to store data, the switches and routers, you name it. There are even some reprogrammable CPU hardware components called FPGA devices: you can customize them to add homebrew "instructions" of your own design, that run in a massively parallel, ultra-fast way. (If that isn't good enough, once you debug your design you can burn a real chip, an ASIC, and it will do the same thing 10x faster for you). Why is this such a big deal? Will we all need to learn to redesign CPUs in order to work on cloud computing solutions?

The Rise of the Programmable Data Center. Michael Vizard. Dice.com, 2012.

SDDC - software-defined data center. Webopedia article.

-- Tue 2/20

February Break, no classes.

In the area near Ithaca, Greek Peak is pretty good (they make snow if it hasn't snowed recently). Or you could drive a few hours and be at Whiteface (Lake Placid), Killington, Mad River Glen... don't waste the break hanging around in your dorm room. Get out on that snow and have fun! (But do sign up for a lesson if you plan to go skiing for the first time, or if you just aren't very good at it yet. We wouldn't want you to break anything).

Image result for skiing
8. Thu 2/22

Modern computers are NUMA machines. Why this matters.

OK, enough fun. Back to work. We were talking about performance and modern computers are massively parallel supercomputers with as many as 100 CPUs on each board. Can we get superfast cloud services by just throwing lots of cores at every computing task? (You learned how to program with threads in your Java class, so if that's the answer, you have it made). Well, tough luck... that isn't going to work. We'll find out why, by looking at how a really famous multi-threaded program (the Linux kernel!) was extended to improve NUMA performance and scalability.

Then, we'll discuss the concept of "transactional" computing as it applies to NUMA scalability. In fact the technique isn't a big success, and we won't discuss it at great length, but I want you to see it partly because in cloud computing, companies and researchers often promise the moon, but then can't really deliver. The promise in this case was the 1993 TM paper, but the reality after more than a decade of work (as we will see when we discuss Proteus TM) is much more complicated and also, more limited, than the original researchers could have expected!

An Analysis of Linux Scalability to Many Cores. S. Boyd-Wickizer, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI10, 2010, Oct. 4-6 2010, Vancouver, BC, Canada.

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors. Austin Clements, Frans Kaashoek, Nickolai Zeldovich and Robert Morris. ACM TOCS 32:4 (Jan 2015).

Transactional memory: architectural support for lock-free data structures. Maurice Herlihy and J. Eliot B. Moss. 1993. In Proceedings of the 20th annual international symposium on computer architecture (ISCA '93). ACM, New York, NY, USA, 289-300. DOI=http://dx.doi.org/10.1145/165123.165164

ProteusTM: Abstraction Meets Performance in Transactional Memory. Diego Didona, Nuno Diegues, Anne-Marie Kermarrec, Rachid Guerraoui, Ricardo Neves, and Paolo Romano. 2016. SIGARCH Comput. Archit. News 44, 2 (March 2016), 757-771. DOI: https://doi.org/10.1145/2980024.2872385

9. Tue 2/27

Key-value storage concept. General idea, specializing it for clouds.

Key-value storage is the "key" to scalability in the cloud (sorry, couldn't resist). What's the main idea? Where did it come from, and what systems use this concept? How can it be specialized for tier 2 of the cloud?

Chord: A scalable peer-to-peer lookup service for internet applications. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01). ACM, New York, NY, USA, 149-160. DOI=http://dx.doi.org/10.1145/383059.383071
10. Thu 3/1

Transactional database model. Part I of a 2-part lecture.

Many programmers like to think of key-value storage very much like relational databases: each key is kind of like an index, and each value could be a row, like in a relation. So to what degree can key-value storage systems be treated like SQL systems? We'll start by learning all about databases and transactions.

KB Chapter 20

11. Tue 3/6

Transactional key-value systems: how do they work? Part II of lecture 10.

Continuing on our theme, what happens when we try to apply this transactional concept to a key-value storage layer in the cloud? Hint: performance collapses! Oh no!

This is a lead-in for a topic we'll revisit in lecture 14 (FaRM).
12. Thu 3/8

SQL versus NoSQL for key-value systems in the cloud.

Saving ourselves from disaster, we can solve the problem we ran into on lecture 11 using a new model called the "NoSQL" model. There is even a programming style called BASE to get us there. How does it work?

KB Sec. 1.6, Sec 5.3

BASE: An Acid Alternative - ACM Queue. Dan Pritchett, eBar. ACM Queue.

Dynamo: Amazon's Highly Available Key-Value Store. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. SIGOPS Oper. Syst. Rev. 41, 6 (October 2007), 205-220. DOI: https://doi.org/10.1145/1323293.1294281

Wikipedia Article on NoSQL.

13. Tue 3/13

Hardware for high speed cloud networking (RDMA) and storage (SSD).

To take the next big step, cloud computing people had to focus on performance accelerators that center on special hardware. We'll learn about the hardware first.

Congestion Control for Large-Scale RDMA Deployments. Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 523-536. DOI: https://doi.org/10.1145/2785956.2787484

RDMA over Commodity Ethernet at Scale. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 202-215. DOI: https://doi.org/10.1145/2934872.2934908

RIPQ: Advanced Photo Caching on Flash for Facebook. Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. 2015. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 373-386.

14. Thu 3/15

How do key-value systems use RDMA? Part I: FaRM

Ok... we have the goal (SQL on key-value stores). We have the hardware (RDMA, SSD). Can we pull off our magic? FaRM is the first system to really try. It works remarkably well, for small value objects (if you have a big object, you can just store it in the file system and put the URL into FaRM).

FaRM: Fast Remote Memory. Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. . In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14). USENIX Association, Berkeley, CA, USA, 401-414. 2014
15. Tue 3/20

How do key-value systems use RDMA? Part II: HeRD/FaSST

This is academic research. At CMU, a research team showed that what FaRM is doing might be as much as 3-5x slower than the best possible performance! To do this, they got very fancy in their use of RDMA. We'll see how they did it (warning: geeky lecture, completely off the edge...)

Using RDMA Efficiently for Key-Value Services. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. In Proceedings of the 2014 ACM conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 295-306. DOI: https://doi.org/10.1145/2619239.2626299

FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 185-201.

16. Thu 3/22

Data replication: Lots of copies versus atomic multicast.

Enough with the key-value stuff. We'll start to talk about a different topic, although it isn't totally unrelated. In file systems and key-value stores, we usually need to replicate data at least once to be sure we won't lose it if something fails -- and in the cloud, something always fails if you wait long enough. So given that failure is inevitable, what can we say about making copies? Is there a "theory of replication"? Hint: Of course there is...

Replication management using the state-machine approach. Fred B. Schneider. In Distributed systems (2nd Ed.), Sape Mullender (Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA 169-197.

Chain Replication for Supporting High Throughput and Availability. Robbert van Renesse, Fred. B. Schneider. Sixth Symposium on Operating Systems Design and Implementation (OSDI 04). December 2004, San Francisco, CA.

17. Thu 3/22

Take home midterm assigned at 5pm. Due Friday 3/23, 5pm.

This exam will focus on material from lectures 9-15, although we consider the course to be a single story, and answers to our questions could reflect things we learned about in lectures 1-8. You won't be tested on replication. The test is a take-home exam, and you have 24 hours in which to do it. We expect you to work on your own, open book but no discussion of the exam with anyone else, please. This format of exam is new in 2018, and we won't be able to give you any kind of sample exams to use in preparing. Instead, just focus on really understanding everything from the lectures we listed, especially lectures 9-15.

18. Tue 3/27

The Paxos model for multicast and for persistent storage logs.

The famous "theory of replication" is often known by a name of the family of protocols Leslie Lamport proposed for solving it: Paxos, which solves the problem of "state machine" replication (SMR). We'll learn about two Paxos variants, one that usually is employed for sending messages reliably from one sender to multiple receivers, and the other for storing data into some kind of log that might be on a disk.

KB Chapter 11

The Part-time Parliament. Lamport, L. ACM Trans. Comput. Syst. 16,2 (May1998), 133–169.

Vertical Paxos and primary-backup replication. Lamport, L., Malkhi, D.,and Zhou, L. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing (PODC'09), ACM, pp.312–313.

19. Thu 3/29

Corfu, a Paxos-based logging solution. vCorfu.

Now that we know the theory, we'll see how VmWare (the most famous company for cloud-scale virtualization tools) uses it to create a product called Corfu and an extended version they call vCorfu. Cool fact: although Corfu does use Paxos, it also uses a Cornell-created technology called chain replication. We'll learn how that works, too.

CORFU: A distributed shared log. Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei, and Ted Wobber. ACM Trans. Comput. Syst. 31, 4, Article 10 (December 2013), 24 pages. DOI=http://dx.doi.org/10.1145/2535930
-- Tue 4/3 Spring Break, no classes. Image result for water skiing
-- Thu 4/5
20. Tue 4/10

Derecho: high speed RDMA multicast and replication.

Every professor always sneaks in their own favorite recent project sometime in the semester. We'll hear all about the amazing Derecho data replication library. Cool fact: Derecho is by far the world's fastest version of Paxos. Not only that, it is also a "constructively optimal" protocol: the theory of replication predicts that you cannot build a protocol faster than certain limits, and Derecho achieves those limits! For cloud computing, this is about as theoretical as it gets. But Derecho is a real software library, and in fact many people will be using it in their class projects.

Building Smart Memories and Scalable Cloud Services with Derecho. Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weija Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert van Renesse. Submitted for publication, September 2017.
21. Thu 4/12

GPU clusters. CUDA and Dandelion.

A lot of modern cloud computing involves some form of graphics, vision or video processing. People do this using GPU clusters, which are traditionally programmed using a very specialized programming language called CUDA. There has been a push to make GPUs look a bit more like general purpose computing tools, and Dandelion is a Microsoft language intended to replace CUDA. We'll see how it works. The one problem: a 10x slowdown.

CUDA C/C++ Basics. NVIDIA presentation at Supercomputing 2011.

Dandelion: a compiler and runtime for heterogeneous systems. Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 49-68. DOI: https://doi.org/10.1145/2517349.2522715

22. Tue 4/17

Spark/Databricks (RDDs). Tensorflow.

If you have never heard of Spark, Databricks, and Tensorflow, you are just not nearly as geeky as you think. But don't worry! After this lecture, you'll be ready for the next party. We'll learn so many acronyms your head will spin... but the technology is genuinely cool. These are the core systems used in modern cloud systems for machine learning and data processing.

Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud 2010.

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, OSDI 2008, December 2008.

TensorFlow: A System for Large-Scale Machine Learning
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283.

23. Thu 4/19

Fog and edge computing: Will it be the next big thing?

So... we finally start to get to the secret agenda of the entire semester! When a cloud touches the ground, we get fog. So natually, when the cloud reaches out to the edge, we get... drumroll... fog computing! We'll learn about this exciting new trend.

FG Chapter 9

Wikipedia Article on Fog Computing.

24. Tue 4/24

Smart homes... smart highways... smart power grids...

We all hear about how smart everything is going to be, and many students in cs5412 will play a role in creating these new systems. Where does cloud computing enter the picture?

Wikipedia Articles on Smart Homes, Smart Highways, Smart Grid.
25. Thu 4/26

How smart will smart devices really be?

One puzzle is this: how much of the machine learning will really live in the sensors themselves?

Wikipedia Article on Smart Devices
26. Tue 5/1

Smart memory systems with Derecho

Did I remember to mention that Derecho is the most amazing technology ever? Well, just in case I forgot, we'll revisit Derecho, focusing now on how it could integrate with fog computing to become a new kind of storage layer with machine learning built right in: a "smart memory". You use it just like any file system, but the files are created by machine learning code. We'll also learn about the handling of real-time in the Freeze Frame File System, and how those same ideas will migrate into Derecho to create FFFS-v2. We will also talk a little about security issues that arise when connecting to sensors (obviously, they do need to use secure networking, but there is a whole issue of why the sensor should trust the data collector on the cloud, and vice versa, and how they learn their configurations, such as who to connect to, how to configure the sensor, etc. Then one worries about why we should trust the sensor's notion of time, or its claims about where it is located and how it is oriented).

[This lecture may be given by Edward Tremel; Ken potentially out of town]

See readings for lecture 20.
27. Thu 5/3

Integrating a smart memory system with Spark/Databricks. RDDs.

The puzzle Derecho creates is this: if the memory system is so smart, what job will be played by Spark/Databricks, or Tensorflow? Focusing on Spark, we'll see that there is a good opportunity to combine the concepts.

[This lecture may be given by Theo Gkountouvas; Ken potentially out of town]

See readings for lectures 4, 20.
28. Tue 5/8

Wide area challenges. Concept of mirroring. Limitations.

Last lecture of the semester! In this lecture, our topic will be wide area systems. In the cloud, there are lots of data centers, and most applications are like the Facebook image system, with global span. How do the subsystems talk to one-another over the WAN network? We'll see that the trick centers on read-only mirroring, but that applications need to be very latency-tolerant to make use of this model. We'll also learn about yet another Cornell technology called the Smoke and Mirrors File System, which offers a surprising way to get a bit of extra performance out of those WAN links.

Smoke and Mirrors: Reflecting Files at a Geographically Remote Location Without Loss of Performance. Hakim Weatherspoon, Lakshmi Ganesh, Tudor Marian, Mahesh Balakrishnan (MSR), and Ken Birman. In Proceedings of 7th USENIX Conference on File and Storage Technologies (FAST '09). San Francisco, CA. February 24-27, 2009.
-- 5/9-5/11

Final project demos should be scheduled in this period.

Everyone will need to schedule a demo for their work, and will have to prepare a poster. The whole team should be there for the demo. We spread these over a few days.