Spring 2018 Syllabus

Notice that there are links to the slides in the syllabus, so you can make notes right on a copy of the slides if you wish.

Readings are either from the Foster/Gannon book (FG), or from Ken's book (KB), or in some cases from papers that appeared in conferences (if so, the paper itself will be listed).

The readings are strongly encouraged, and yet are still optional. They are a good way to learn more, or to gain clarity if you want to go back over a lecture topic that you weren't completely comfortable with from lecture (or if you miss a lecture entirely). To learn the material in CS5412 it would usually be a good idea to at least look at these recommended readings. Just the same, you won't ever be "tested" on something that requires knowing things that were never covered in class, or asked about optional readings when presenting your project (you might be asked about topics covered in class that seem relevant to the project).

Some lectures are really part of a larger group of lectures with multiple readings, covered on multiple class meetings. In those cases the same reading won't be repeated again and again, yet we might refer back to things that the prior lecture and reading explored.

Ken will post the slides for each lecture within a day or two after that lecture.

  Date Topic Reading (optional, see note above)
1. Thu 1/25

Cloud computing overview: Edge, Network, Cloud roles. 

This is just an introduction, but we'll see a LOT of terminology. The edge is basically the world of web browsers and apps (which often are best viewed as specialized web browsers). The network is more active in modern cloud systems than you might realize; we'll touch on that. And finally, we'll ask what jobs the data center itself plays.

A big part of our focus in 2018 is the shift from a "classic" datacenter style used purely for web queries towards a more dynamic one in which data evolves rapidly at the edge, or in some form of physical world.  Facebook is the one cloud computing company that already uses this style of computing heavily, so in 2018 we'll draw a lot of examples from the way Facebook's edge is able to stay up to date and performant.  But we will also see examples from Google, Microsoft Azure, and eBay.

Along the way we'll hear about a big trend in cloud computing: the move from virtualization to containers.  Mesos was the first system to do this.  It matters because one of the puzzles for edge computing centers on leveraging fast hardware in ways that seem to be incompatible with heavy virtualization of the kind seen on Amazon.com and supported by VMWare.  Mesos and container virtualization eliminated this problem by getting rid of one especially costly level of "abstraction".

FG chapters 1, 2, 4, 5, 6, 9.

KB chapters 1-5.

Wikipedia Article on Fog Computing.

Mesos: a platform for fine-grained resource sharing in the data center. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica, In Proceedings of the 8th USENIX conference on Networked systems design and implementation (NSDI'11). USENIX Association, Berkeley, CA, USA, 295-308.


2. Tue 1/30

How Facebook serves image requests so rapidly.

The best way to really understand the scale and complexity of the cloud is to look at something concrete, so we'll drill down on the tricks Facebook uses to instantly serve up feeds to a hundred million simultaneous users. It centers on caching... but at global scale (geocaching, but not in the sense of Pokemon go...)

An Analysis of Facebook Photo Caching. Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 167-181. DOI: https://doi.org/10.1145/2517349.2522722

RIPQ: Advanced Photo Caching on Flash for Facebook. Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. 2015. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 373-386.

Finding a Needle in a Haystack: Facebook's photo storage. Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 47-60.

3. Thu 2/1

Tradeoffs in the cloud. CAP Conjecture and first-tier programming.

We heard about CAP in the first lecture. In this lecture we'll learn about the theory underlying CAP. Some think of CAP as a theorem, but it is more like a conjecture, and in fact it is actually false. But weak consistency is still very common in the cloud. On the other hand, is something really "weak" if it works well enough for the intended purpose? This is like the famous tree that falls in the forest. If nobody cares ("nobody is listening"), did that tree make a loud noise? So is consistency weak if nobody checks?

We will also learn about Lamport's famous model for consistency (his causal ordering "happens before" relation, and his concept of a consistent cut or snapshot).

KB 1.6, Sec. 5.3, chapter 10, Sec 10.2

FG chapter 2 (Sec 2.2)

How Page Load Time Affects Bounce Rate and Page Views. Roxana Elliott. Section.io, August 10, 2017.

A Certain Freedom: Thoughts on the CAP theorem. Eric Brewer. In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing (PODC '10). ACM, New York, NY, USA, 335-335. DOI=http://dx.doi.org/10.1145/1835698.1835701

BASE: An Acid Alternative - ACM Queue. Dan Pritchett, eBar. ACM Queue.

Wikipedia Article on NoSQL.

Time, clocks, and the ordering of events in a distributed system.  Leslie Lamport. Commun. ACM 21, 7 (July 1978), 558-565. DOI=http://dx.doi.org/10.1145/359545.359563

Distributed snapshots: determining global states of distributed systems. K. Mani Chandy and Leslie Lamport. ACM Trans. Comput. Syst. 3, 1 (February 1985), 63-75. DOI=http://dx.doi.org/10.1145/214451.214456

4. Tue 2/6 How the need for fast edge computing will force the first tier to evolve.

Today, the first-tier is often very stale (CAP).  But fog computing systems will force us to move accurate versions of some forms of data to the edge, into "data concentrators" that can take immediate actions if necessary.  We'll look at the delays associated with each layer of the cloud, and then will look specifically at how the file systems used in today's cloud sometimes embody CAP-like limitations.  But it turns out that we can also improve these behaviors using Lamport's ideas.
The Freeze-Frame File System. Weijia Song, Theo Gkountouvas, Qi Chen, Zhen Xiao, Ken Birman. ACM Symposium on Operating Systems Principles (SOCC 2016). Santa Clara, CA, October 05 - 07, 2016.
5. Thu 2/8

Intelligent behavior in today's cloud edge. TAO.

It turns out that CAP isn't a universal law in the cloud.  In this lecture we will learn about another Facebook technology called TAO.  TAO has a "mostly" consistent copy of the portion of the social-networking graph that gets used for any queries from the region, and it uses write-through caching to update that local copy when something changes.  Then it has a back-end that gets updated in batches.  This offers another illustration of how edge computing systems might actually be able to use cloud scalability techniques without abandoning consistency.  TAO isn't totally consistent, but in fact it is relatively rare that applications could notice.

FG Chapters 8-10

TAO: Facebook's Distributed Data Store for the Social Graph. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani. 2013 USENIX Annual Technical Conference (USENIX ATC '13).

6. Tue 2/13

State Machine Replication.  Paxos.  Derecho.

One dimension of consistency centers on data replication.  We mentioned this topic in previous lectures, but it deserves a closer look.  In this lecture we will learn about the state machine replication problem, and why it matters.  Then we will look at Leslie Lamport's famous Paxos protocol, which is constructed using a concept called two-phase commit.  Last, we will learn about the virtual synchrony model for dynamically managing the membership of a replication group.  This is what Cornell's new Derecho system implements (virtual synchrony + Paxos); some people will be using it in their projects.  In Derecho, virtual synchrony is automatic, and replication can be used for replicating data, key-value storage, computational tasks and configuration parameters.  Moreover, the same mechanisms provide for distributed coordination and synchronization.

KB Chapter 10, 11.

Replication management using the state-machine approach. Fred B. Schneider. In Distributed systems (2nd Ed.), Sape Mullender (Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA 169-197.

The Part-time Parliament. Lamport, L. ACM Trans. Comput. Syst. 16,2 (May1998), 133–169.

Building Smart Memories and Cloud Services with Derecho. Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weija Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert van Renesse. Submitted for publication, September 2017.
7. Thu 2/15 So: RDMA is fast, and Derecho used RDMA.  What's so hard about that? 

We'll look really closely at some of the challenges of getting Derecho to run at the full possible speed allowed by RDMA.  Along the way we'll hear about some classic technology disasters of the early cloud computing era, like Ethernet multicast storms caused by Bloom filter overflows (a very technical-sounding term for a pretty dramatic kind of failure), and we'll see that RDMA isn't as simple a story as one might wish.  There are many lessons here about next-generation data center acceleration hardware!

The Rise of the Programmable Data Center. Michael Vizard. Dice.com, 2012.

SDDC - software-defined data center. Webopedia article.

Congestion Control for Large-Scale RDMA Deployments. Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 523-536. DOI: https://doi.org/10.1145/2785956.2787484

RDMA over Commodity Ethernet at Scale. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 202-215. DOI: https://doi.org/10.1145/2934872.2934908

-- Tue 2/20

February Break, no classes.

If you'll spend the break in Ithaca, Greek Peak is pretty good (they make snow if it hasn't snowed recently). Or you could drive a few hours and be at Whiteface (Lake Placid), Killington, Mad River Glen... don't waste the break hanging around in your dorm room. Get out on that snow and have fun! (But do sign up for a lesson if this is your first time).

... and hey, if skiing doesn't appeal, it is never too cold for a quick swim!

Image result for skiing Snowboarder
8. Thu 2/22

Locking is slow on NUMA machines. Why this matters and what we can learn. 

As we push deeper into the applications that might run on the edge, we've seen that zero-copy communication might be harder than it sounds.  But this is just one of a number of issues.  A surprisingly big puzzle is that to leverage the power of NUMA machines, we basically need to completely avoid locking and for that matter, sharing data between cores: by and large, we want each core to have its own copy of any interesting data, and run almost entirely lock-free!

We'll see this play out by looking at the evolution of Linux on NUMA platforms, and how they evolved the locking layer.

Then in lectures 9-13, we will revisit a similar question of coordinated access to shared memory, but in a distributed setting using massive scalable storage structures: key-value data stored using distributed hash tables. 

Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems. Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan. 2014.  In Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT '14). ACM, New York, NY, USA, 289-300.   Slide set.

An Analysis of Linux Scalability to Many Cores. S. Boyd-Wickizer, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI10, 2010, Oct. 4-6 2010, Vancouver, BC, Canada.

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors. Austin Clements, Frans Kaashoek, Nickolai Zeldovich and Robert Morris. ACM TOCS 32:4 (Jan 2015).

Decoupling contention management from scheduling, Ryan Johnson, Radu Stoica, Anastasia Ailamaki, Todd C. Mowry. ASPLOS 2010, March 13-17.


9. Tue 2/27 Transactional Model: Local database on one machine, across several machines, and for NUMA memories.  

We learned about concurrent memory access and locking in NUMA settings in lecture 8.  Lecture 9 shifts from a NUMA setting to a similar set of questions about distributed memory, but it takes a few lectures to cover the distributed topic.

For us, locking in distributed settings is really an aspect of a broader "model" called "transactional atomicity."  Transaction
s are one of the most universal and important concepts in computing.  They originated in database systems as a way to simplify concurrent database access; we'll discuss this briefly and outline the basics.  Then they were extended to multiple machines using the 2 phase commit concept, which we will also outline, and finally in recent years, were applied as a tool for concurrent programming against some form of memory shared by a set of concurrent threads.

This last question is best appreciated in historical order, so today we'll look at multithreaded concurrency in NUMA machines and then we'll pause.  In the next lectures we will see that transactions could also be applied to a DHT if it acted sufficiently similarly to a NUMA memory, and that this is in fact entirely feasible: two famous systems use RDMA hardware (the same as was used in Derecho) to achieve NUMA-like memory semantics even for a fault-tolerant key-value storage system (a.k.a. a distributed hash table, or DHT). 

KB: Chapter 10

Transactional memory: architectural support for lock-free data structures. Maurice Herlihy and J. Eliot B. Moss. 1993. In Proceedings of the 20th annual international symposium on computer architecture (ISCA '93). ACM, New York, NY, USA, 289-300. DOI=http://dx.doi.org/10.1145/165123.165164

ProteusTM: Abstraction Meets Performance in Transactional Memory. Diego Didona, Nuno Diegues, Anne-Marie Kermarrec, Rachid Guerraoui, Ricardo Neves, and Paolo Romano. 2016. SIGARCH Comput. Archit. News 44, 2 (March 2016), 757-771. DOI: https://doi.org/10.1145/2980024.2872385

10. Thu 3/1

Deep Dive: Key-Value Storage.

Now we'll move on and leave NUMA questions behind. This lecture revisits DHTs.  First, we will look at the grandfather of modern DHTs, the MIT Chord system. Then we'll discuss Amazon's Dynamo and Dynamo DB technologies.

Chord: A scalable peer-to-peer lookup service for internet applications. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01). ACM, New York, NY, USA, 149-160. DOI=http://dx.doi.org/10.1145/383059.383071

Dynamo: Amazon's Highly Available Key-Value Store.  Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007.  In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP '07). ACM, New York, NY, USA, 205-220. DOI: https://doi.org/10.1145/1294261.1294281

11. Tue 3/6

Transactional database model as applicable to key-value storage.

Many programmers like to think of key-value storage very much like relational databases: each key is kind of like an index, and each value could be a row, like in a relation.  But a system like Dynamo has no transactions!   So to what degree can key-value storage systems be treated like SQL systems? In this lecture we'll just talk about what it would mean to apply a transactional model to a distributed key-value store.  It turns out that Jim Gray warned against doing that in a casual way; we'll touch on his famous paper about this.

KB Chapter 20

The Dangers of Replication and a Solution. Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. 1996. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96), Jennifer Widom (Ed.). ACM, New York, NY, USA, 173-182. DOI=http://dx.doi.org/10.1145/233269.233330
12. Thu 3/8


How can key-value systems leverage RDMA? Part I: FaRM

Ok... we have the goal (SQL on key-value stores). But we do have some very fancy hardware (RDMA, SSD). Can we pull off a magic solution?  We already saw from Derecho that this sometimes is possible, although isn't necessarily as easy as you might wish! FaRM is the first system to really try (the work predates Derecho, by the way). It works remarkably well, but is optimized for fairly small objects (if you have a big object, you can just store it in the file system and put the URL into FaRM).

FaRM: Fast Remote Memory. Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. . In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14). USENIX Association, Berkeley, CA, USA, 401-414. 2014
13. Tue 3/13

How do key-value systems use RDMA? Part II: HeRD/FaSST

This is academic research. At CMU, a research team showed that what FaRM is doing might be as much as 3-5x slower than the best possible performance! To do this, they got very fancy in their use of RDMA. We'll see how they did it (warning: geeky lecture, completely off the edge...)

The point these guys make isn't even related to the issues we talked about when we looked at FaRM and Derecho.  They argue that unreliable datagrams can often outperform reliable ones, and that for small objects (like key-value stores that just hold numbers, or fixed-length URLs) that "inlining" the data makes an unexpectedly big different in speed.


Using RDMA Efficiently for Key-Value Services. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. In Proceedings of the 2014 ACM conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 295-306. DOI: https://doi.org/10.1145/2619239.2626299

FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 185-201.

14. Thu 3/15

Corfu, a Paxos-based logging solution 

DHTs are not the only option for scalable storage. At VmWare, the big focus is on a logging technology called Corfu, which was originally invented at Microsoft but was never adopted by the company, and then became open-source, and now even has users at companies like Facebook. Corfu manages pools of disks and creates a single large highly reliable log that applications share. It uses Paxos for the "head of log" agreement, and most applications combine it with Chain Replication (lecture 18) for reliability and fault-tolerance. The clever idea here is that only the Paxos service is a bottleneck: the actual replication activity is a concurrent task that different applications can carry out separately and in application-specific ways; this results in a very speedy yet strong form of consistency that fits well with fog scenarios. Really recent work has extended Corfu for use internal to a NUMA machine with huge numbers of cores, but we will just touch on that. Cool thing to think about: if you modified Corfu to use a "Byzantine fault-tolerant" head of log service, and then chained the log blocks together cryptographically, you would have an ultra-fast blockchain. Just saying...

CORFU: A distributed shared log. Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei, and Ted Wobber. ACM Trans. Comput. Syst. 31, 4, Article 10 (December 2013), 24 pages. DOI=http://dx.doi.org/10.1145/2535930

Black-box concurrent data structures for NUMA architectures.
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos Aguilera.
ASPLOS 2017: 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Xi'an, China, April 2017.

15. Tue 3/20 Microservice Architectures. 

Systems like the ones that power Netflix, eBay and Amazon.com achieve massive parallelism through a "microservices" approach.  We'll look at the one used by Amazon, but many of its customers (like Netflix) actually share the same technology.  Then we'll dive into its underlying implementation to see how they map it to cloud resources using containers, DHTs, logs, message-bus technology, replication, etc. Amazon and Netflix use Amazon's micro-service model.  In contrast, eBay is a big proponent of the OpenStack cloud model, Mesos, and a number of other open standards, and they achieve amazing performance.  They rent containers or bare metal or even VMs, but avoid any dependency on the cloud-providers' proprietary stack.  Many students in cs5412 will be doing projects using some of the same technologies!






16. Thu 3/22 Strong consistency versus Scalability First: A cloud-computing dilemma. 

Today's (short) lecture will wrap up lectures 1-15 by looking at a puzzle that unifies the whole series of topics: strong consistency in the sense of Paxos or transactions (or a more complicated service that uses consistency, like the Corfu log), versus CAP-style computing with very heavy use of caches that might not be "coherent".  The puzzle, in a nutshell, is this: Paxos and other agreement ("consensus") mechanisms are complicated, and as we scale them up, that complexity grows.  CAP argues for simplicity.  Yet as we have seen (notably in connection with datacenter-scale IP multicast and then with RDMA+DCQCN on RoCE) even very simple mechanisms become complex at scale, and this complexity rivals that of any consistency preserving protocol.  So if we take CAP to be a variant on the End-to-End principle, but we recognize that CAP itself was naive about complexity, what should we conclude?

We will have time for any last-minute questions about Lectures 1-15 at the end of class.

The prelim will be available starting at the end of class and is due the next day: Friday 3/23, at 5pm.  Once you start work, you should finish within 24 hours.  You will submit a PDF file on CMS (you are expected to type your solutions, not hand-write them). 

You can find past examples of in-class prelims in the old format on the web sites for past offerings of cs5412, but those were sit-down 1-hour exams taken in class.  The new take-home format is an experiment, and the class itself has been redesigned since 2016 when it was last offered.  As a result, we don't have old sample exams to share.  Some topics from 2016 were not covered in 2018, and vice-versa.

17. Tue 3/27 Chain Replication. 

By now we have heard about primary-backup replication several times.  We also heard about Paxos and Derecho, but even those they offer the strongest guarantees, they bring a significant degree of complexity associated with their generality.

Chain replication emerged as a response to the ever-growing complexity of "full-fledged" replication, as with Paxos.  The focus of chain replication is on maintaining one or more backups in cloud-scale storage systems, especially for tier-two systems (the same layer where most of the DHTs we discussed are used).
 Cornell professors Van Renesse and Schneider got interested in whether they could create the simplest possible data replication scheme that would still offer atomic updates (one-shot transactions).  They came up with Chain Replication, and it has become very popular.  When making just one replica, you get a primary-backup model, but it allows K replicas for any value of K.  The approach is genuinly very stripped-down, yet has surprising strong guarantees. We'll also see that the method combines nicely with a technology called Sinfonia, created at Hewlett Packard (yes, they do cloud systems too!)

Chain Replication for Supporting High Throughput and Availability. Robbert van Renesse, Fred. B. Schneider. Sixth Symposium on Operating Systems Design and Implementation (OSDI 04). December 2004, San Francisco, CA.

Sinfonia: A new paradigm for building scalable distributed systems. Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. ACM Trans. Comput. Syst. 27:3, (Nov. 2009). DOI=http://dx.doi.org/10.1145/1629087.1629088

18. Thu 3/29


Zookeeper: A File System for Cloud Computing and Distributed Coordination.

The Zookeeper system is a file system that implements a version of Paxos (they call it Zookeeper Atomic Broadcast, or ZAB).  By offering reliability through a standard POSIX file system, Zookeeper emerged as the majority solution for reliable storage in many cloud platforms.  It is a hugely popular and widely used technology today.

ZooKeeper: Distributed Process Coordination. Flavio Junqueira and Benjamin Reed. 2017, O'Reilly. ISBN-13: 978-1449361303. ISBN-10: 1449361307 Apache Zookeeper Site: https://zookeeper.apache.org

-- Tue 4/3 Spring Break, no classes. Image result for water skiing
-- Thu 4/5
19. Tue 4/10

HDFS, HBase, YARN and Hadoop (a version of MapReduce).

Hadoop is a popular open-source version of MapReduce. We'll discuss the MapReduce model, and then talk about how Hadoop actually works.  Hadoop is a way to run parallel batch applications on files, using a scheduler called YARN and a set of file access accelerators called HBase, PIG and Hive.  We won't talk about PIG and Hive in this lecture.


MapReduce and Hadoop are such huge topics that it is hard to know where to begin.  The WikiPedia article gives a good overview of the Hadoop platform and has links with detail on HDFS, HBase, YARN, etc.  There are also journal papers written at Google on their versions, which were the original ones: MapReduce, GFS, etc.

20. Thu 4/12

Hive and PIG.  Publish-Subscribe Messaging Systems. Kafka.

We will continue to explore the ecosystem associated with the Hadoop community.

First we will look at Hive and PIG, which are more tools for file access, but a bit fancier than HBase.

Then we will switch topics a bit, and discuss Kafka.  You might think that files and messaging are totally different ideas, but a long history of work on "message oriented middleware" has yielded some very popular systems that merge the two concepts: in these, you can generate objects that are not just stored, but also "notified" in the sense that applications can monitor files or "topics" for updates.  Examples found in the cloud include Kafka and OpenSplice.

KB:  Chapter xx

FG: Chapter xx

Apache Kafka Site: https://kafka.apache.org/
21. Tue 4/17

Spark/Databricks (RDDs).

If you have never heard of Spark, Databricks, and Tensorflow, you are just not nearly as geeky as you think. But don't worry! After this lecture, you'll be ready for the next party. We'll learn so many acronyms your head will spin... but the technology is genuinely cool. These are the core systems used in modern cloud systems for machine learning and data processing.

TensorFlow is actually very closely related to Spark, but we'll delay and talk about it later.  So in this lecture we will focus on Spark/Databricks (two names for one system -- Spark is the free version, and DataBricks is a company that has commercialized Spark and built all sorts of products over it). 

Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud 2010.

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, OSDI 2008, December 2008.


22. Thu 4/19

Integrating the Freeze Frame File System with RDDs.

This will be a fairly geeky talk. Early in the course we heard about FFFS (it was the real-time file system with strong consistency mentioned in Lecture 4). Now that we know about Spark and TensorFlow, it may sound obvious that they should be able to leverage one-another. But how would one actually connect a temporal file system like FFFS to one of these? We'll focus on RDDs for Spark, just because we have actually done this and made it work, and we'll see (1) how one does this, (2) what power the RDDs bring to FFFS, and (3) what value the FFFS temporal snapshot model (temporal accuracy and causal consistency) brings to Spark. One could do something very similar with Tensor Flow, using its Python coding style.

See readings for lectures 4, 21.

23. Tue 4/24


This lecture will move beyond our focus on Spark and things that can talk to Spark.  We'll have a short look at TensorFlow, which is a huge new "movement" originating at Google but rapidly sweeping the Fog and Edge computing community.

TensorFlow: A System for Large-Scale Machine Learning
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283.
24. Thu 4/26

Guest lecture: Caspar.ai smart home concept

[Ashutosh Saxena, Caspar.ai CEO]

Wikipedia Article on Smart Devices, Smart Homes, Smart Highways, Smart Grid.

Caspar.ai is working on smart homes, but they are in the middle of this ecosystem.

Caspar.ai web site
25. Tue 5/1

Secure and private querying for in-home sensors.

One puzzle with fog computing is that for reasons of privacy, we will often need to keep sensitive data in the home or on the edge devices.  So we need a way to query that data, and obviously we wouldn't want the queries to reveal all the secret stuff.  Edward Tremel has a solution to this; we'll invite him to explain his approach and present some evaluation he has done.

[This lecture may be given by Edward Tremel; Ken potentially out of town]

26. Thu 5/3

Wide area challenges. Concept of mirroring. Limitations.

In this lecture, our topic will be wide area systems. In the cloud, there are lots of data centers, and most applications are like the Facebook image system, with global span. How do the subsystems talk to one-another over the WAN network? We'll see that the trick centers on read-only mirroring, but that applications need to be very latency-tolerant to make use of this model. We'll also learn about yet another Cornell technology called the Smoke and Mirrors File System, which offers a surprising way to get a bit of extra performance out of those WAN links.

Smoke and Mirrors: Reflecting Files at a Geographically Remote Location Without Loss of Performance. Hakim Weatherspoon, Lakshmi Ganesh, Tudor Marian, Mahesh Balakrishnan (MSR), and Ken Birman. In Proceedings of 7th USENIX Conference on File and Storage Technologies (FAST '09). San Francisco, CA. February 24-27, 2009.
27. Tue 5/8

Google's Spanner system and how it achieves consistency at world-wide scale.

Last lecture of the semester! Spanner is a fairly new Google project. By using GPS clock synchronization in a clever way, Spanner obtains world-wide consistency for edge updates. At the same time, Spanner can bring fairly long delays. This sheds important light on a kind of locality that will be very important in future edge and fog applications: for data consistency at local scale, we need reactive edge tools, but data consistency at world-wide scale is tricky because it comes with a significant temporal lag that can't easily be concealed (unless you figure out a way to work around the speed of light limit...)

-- 5/9-5/11

Final project demos should be scheduled in this period.

Everyone will need to schedule a demo for their work, and will have to prepare a poster. The whole team should be there for the demo. We spread these over a few days.