This is the archived Fall 2018 site.  For 2019, visit

CS6465: Emerging Cloud Technologies and Systems Challenges

Hollister Hall Room 320, Tuesday/Thursday 1:25-2:40

CS6465 is a PhD-level class in systems that tracks emerging cloud computing technology, opportunities and challenges. It is unusual among CS graduate classes: the course is aimed at a small group of students, uses a discussion oriented style, and the main "topic" is actually an unsolved problem in computer systems. The intent is to think about how one might reduce that open problem to subproblems, learn about prior work on those, and extract exciting research questions.  The PhD focus centers on that last agenda element.

In this second offering, we plan to focus on issues raised by moving machine learning to the edge of the cloud. In this phrasing, edge computing still occurs within the data center, but for reasons of rapid response, involves smart functionality close to the client, under time pressure.  So you would think of an AI or ML algorithm written in as standard a way as possible (perhaps, Tensor Flow, or Spark/Databricks using Hadoop, etc).  But whereas normally that sort of code runs deep in the cloud, many minutes or hours from when data is acquired, the goal now is to keep the code unchanged (or minimally changed) and be able to run on the stream of data as it flows into the system, milliseconds after it was acquired.  We might also push aspects of machine learned behavior right out to the sensors.

This idea is a big new thing in cloud settings -- they call it "edge" computing or "intelligent" real-time behavior.  But today edge computing often requires totally different programming styles than back-end computing.  Our angle in cs6465 is really to try and understand why this is so: could we more or less "migrate" code from the back-end to the edge?  What edge functionality would this require?  Or is there some inherent reason that the techniques used in the back-end platforms simply can't be used in the edge, even with some sort of smart tool trying to help.

The goal of this focus on an intelligent edge is, of course, to motivate research on the topic.  As a systems person, Ken's group is thinking about how to build new infrastructure tools for the intelligent edge.  Those tools could be the basis of great research papers and might have real impact.  But others work in this area too, and we'll want to read papers they have written. 

Gaps can arise at other layers too.  For example, Tensor Flow is hugely popular at Google in the AI/ML areas, and Spark/Databricks plus Hadoop (plus Kafka, Hive, HBase, Zookeeper, not to mention plus MatLab, SciPy, Graphlab, Pregle, and a gazillion other tools) are insanely widely used.  So if we assume that someone is a wizard at solving AI/ML problems using this standard infrastructure, but now wants parts of their code to work on an intelligent edge, what exactly would be needed to make that possible?  Perhaps we would need some new knowledge representation, or at least some new way of storing knowledge, indexing it, and searching for it.  This would then point to opportunities for research at the AI/ML level as well as opportunities in databases or systems to support those new models of computing.

 CS6465 runs as a mix of discussions and short mini-lectures (mostly by the professor), with some small take-home topics that might require a little bit of out-of-class research, thinking and writing. Tthere won't be a required project, or any exams, and the amount of written material required will be small, perhaps a few pages to hand in per week. Grading will mostly be based on in-class participation.

CS6465 can satisfy the same CS graduate requirements (in the systems area) as any other CS6xxx course we offer.  Pick the course closest to your interests, no matter what you may have heard.  CS6410 has no special status.

Schedule and Readings/Slides

Date Topic Readings, other comments on the topic Thought questions
Thu Aug 23 1. Overview of our topic: bringing machine learning to the edge.  Just a get-to-know you meeting.  Ken will probably show some Microsoft slides from a recent MSR faculty summit where they told us about Azure IoT Edge and "Intelligent Edge".  
Tue Aug 28 2. Consistency requirements for distributed machine learning at the edge.  Microsoft's FaRM system.  The first part of this meeting will focus on a discussion of what consistency should mean for real-time distributed machine learning systems running close to the edge of the cloud. 

The second part will dive in and look at the FaRM paper, in part keeping in mind our idea of what forms of consistency are needed.  The link is here:

FaRM: Fast Remote Memory. Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. . In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14). USENIX Association, Berkeley, CA, USA, 401-414. 2014
Thu Aug 30 3. Is FaRM the ideal solution to the RDMA DHT problem?  HeRD and FASST.

We shouldn't take FaRM for granted.  So we'll look at the competitor!  But keep those questions about consistency in mind...  If you only read one, read the first of  these.

Using RDMA Efficiently for Key-Value Services. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. In Proceedings of the 2014 ACM conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 295-306. DOI:

FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 185-201.

Tue Sept 4 4. A different view on consistency: Leslie Lamport's causal ordering model and the Chandy/Lamport concept of a consistent cut.  Kulkarni timestamps.  Slides.   [Taught by Theo Gkountouvas because Ken will be out of town] Again, if you don't have time to read several of these, read the first one, or the first and the second one.  The two are both classics!

Time, clocks, and the ordering of events in a distributed system. Leslie Lamport. Commun. ACM 21, 7 (July 1978), 558-565. DOI=

Distributed snapshots: determining global states of distributed systems. K. Mani Chandy and Leslie Lamport. ACM Trans. Comput. Syst. 3, 1 (February 1985), 63-75. DOI=

Logical Physical Clocks. KULKARNI, S. S., DEMIRBAS, M., MADAPPA, D., AVVA, B., AND LEONE, M. In Principles of Distributed Systems. Springer, 2014, pp. 17–32. 
Thu Sept 6 5.  Freeze Frame File System is built around the idea of offering Lamport's concept as the basis of a consistency model for stored files.  We'll see how this works.  Slides from Theo are here. [Taught by Theo Gkountouvas because Ken will be out of town]

The Freeze-Frame File System. Weijia Song, Theo Gkountouvas, Qi Chen, Zhen Xiao, Ken Birman. ACM Symposium on Operating Systems Principles (SOCC 2016). Santa Clara, CA, October 05 - 07, 2016.

Tue Sept 11 6. State Machine Replication and the Paxos model.  Introduction and overview.  Roles of Paxos in the Apache Hadoop "ecosystem".  Zookeeper model of how to make Paxos look like a file system.

Slides from Theo on Paxos protocols. 

Replication management using the state-machine approach. Fred B. Schneider. In Distributed systems (2nd Ed.), Sape Mullender (Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA 169-197.

The Part-time Parliament. Lamport, L. ACM Trans. Comput. Syst. 16,2 (May1998), 133–169.

Paxos made Moderately Complex.  Robbert van Renesse and Deniz Altinbuken. ACM Comput. Surv. 47, 3, Article 42 (February 2015), 36 pages.

Not simple.. just think of Paxos as "2 1/2 phase commit used to deliver a message to every process, in order, with durability."  But keep in mind, this doesn't include extra phases that may be needed by the Proposer (leader) to resolve concurrency conflicts or to clean up after failures.
Thu Sept 13 7.   Zookeeper:  A deeper dive.  ZAB protocols.  Zookeeper API. A simple totally ordered broadcast protocol. Benjamin Reed and Flavio P. Junqueira. 2008. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, New York, NY, USA, , Article 2 , 6 pages. DOI=

The life and times of a zookeeper. Flavio P. Junqueira and Benjamin C. Reed. 2009. In Proceedings of the 28th ACM symposium on Principles of distributed computing (PODC '09). ACM, New York, NY, USA, 4-4. DOI:
ZooKeeper: Distributed Process Coordination. Flavio Junqueira and Benjamin Reed. 2017, O'Reilly. ISBN-13: 978-1449361303. ISBN-10: 1449361307 Apache Zookeeper Site:
Question: Is ZAB described in a clear and convincing way in this LADIS paper, or the PODC paper?

ZAB is a multicast protocol, not a durable storage protocol, but Zookeeper uses periodic checkpoints once every five seconds to provide persisted storage.  Does this policy actually solve Paxos?  If not, what are some of the ways that an application might notice the difference?
Tue Sept 18 8.   The two results we will talk about are: the FLP impossibility result and the weakest failure detector for guaranteeing progress. 

Some slides from Theo are here.
Impossibility of distributed consensus with one faulty process. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. In Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems (PODS '83). ACM, New York, NY, USA, 1-7.

The weakest failure detector for solving consensus. Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. 1996. J. ACM 43, 4 (July 1996), 685-722.
This lecture is about some very difficult, yet important, mathematics.  Think of it as a one-day theoretical "side tour" into the mathematics of the distributed computing area.  

Some CS6465 students lack the background to read these kinds of papers.  Don't panic!  Not everyone is prepared to read this sort of very difficult theory.  Even so, it can be useful to know about it.  If you can't follow the math, just try and understand the basic idea of what they are saying.

A thought question: what does "impossibility" mean in the FLP paper?
Thu Sept 20 9.  Corfu: An append-only log system that combines Paxos with Chain Replication.  CORFU: A distributed shared log. Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei, and Ted Wobber. ACM Trans. Comput. Syst. 31, 4, Article 10 (December 2013), 24 pages. DOI=

Chain Replication for Supporting High Throughput and Availability. Robbert van Renesse, Fred. B. Schneider. Sixth Symposium on Operating Systems Design and Implementation (OSDI 04). December 2004, San Francisco, CA.
The last lectures really focused on the idea of consensus, and on the idea that in some sense it can be exposed in several ways (the Paxos log, or as an atomic multicast like ZAB).

Corfu is back to the Paxos log, but has a very different way to implement it, using Paxos just for a kind of counter (the end of log pointer), and then using a simple copying method (chain replication) for fault-tolerance.  Is this "legal" or does it break the Paxos properties?
Tue Sept 25 10. vCorfu: A way of scaling Corfu up by using lots of logs ("sharding") and virtualizing the log individual applications deal with ("filtering"). vCorfu: A Cloud-Scale Object Store on a Shared Log.  Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, Dahlia Malkhi.  NSDI 2017.
Corfu became popular and it forced the developers to scale far beyond what they originally had in mind.  They ended up with a concept for running Corfu with a lot of logs, not just one.  Additionally, they have a kind of "materialized view" of the global log for efficiency, something they call the "object stream".  The basic idea is to let Tango (their transactional layer) have a rapid and complete cached log and then keep the full log elsewhere, to avoid inefficient access patterns and "holes", which are a problem for them.

When you take this to the limit, is Corfu still a log?

Thought question: why is vCorfu not making more aggressive use of checkpoints?
Thu Sept 27 11. World's fastest Paxos solution: Derecho C++ library. Derecho: Fast State Machine Replication for Cloud Services.  Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weija Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert van Renesse. Submitted for publication, September 2017, revised and resubmitted July 2018.

RDMC: A Reliable Multicast for Large Objects. Jonathan Behrens, Sagar Jha, Ken Birman, Edward Tremel.  To appear, IEEE DSN ’18, Luxembourg, June 2018.
Derecho looks at the mapping of Paxos to fast hardware: the modern RDMA technology.  What benefits does this bring?

We will also talk about virtual synchrony and the epoch model it uses.
Tue Oct 2 12. So, why is Derecho this fast?  Asynchronous flow programming and "refactoring" Paxos to match the hardware. (same papers) I want to devote one whole meeting to just understanding precisely why Derecho turns out to be so fast, because there is a "portable insight" here that applies to other systems.

So our topic will look at Derecho's speed, but with the goal of asking what applications need to do to leverage that speed.  This leads to an open research topic ("Zero copy software libraries and operating systems").
Thu Oct 4 13.  What abstractions will edge programmers actually want? Remote regions: A simple abstraction for remote memory. Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei.  To appear, NSDI 2018.

Slides from this talk.
I don't want to have three full lectures on Derecho, but my topic is still tied to Derecho: If our plan is to use Derecho near the edge, is the C++ library API a sensible way to expose it?  Or should it look like a file system (think back to Theo's lecture on FFFS), or like Zookeeper, or perhaps something else?

To bring a new perspective in, we'll also talk about work at VMWare that focuses on an ultra-simple remote memory idea, also built on RDMA.

(Oct 6 - Oct 9) Fall break, no class Autumn in Vermont Let's hope for amazingly colorful leaves!  (It somewhat depends on the timing of the first real frost: we need one or two nights of really cold weather to trigger a "flash" from green to bright colors, and that doesn't happen some years.)
Thu Oct 11 14. Continued discussion about roles a technology like the ones we've discussed in class up to now.

We will situate our discussion in the context of an edge infrastructure using modern function computing models.
Cloud functions" are a hot new model that seems to be the next big thing for programming cloud applications.  You can read about Azure Function Server, Function PaaS models, or Amazon AWS Lambda to see examples.

A cloud function is really just a short-running program triggered by some kind of event (think of a remote method invocation), that does anything you like, and then terminates.  These functions don't retain any local data (they are "stateless") but they can definitely write to the file system or to a database, etc.  They just don't create local data structures that would be used on the next event -- each event sees a "clean" initial state.

Functions can normally be coded in languages like Python, although Microsoft prefers C# .NET or F# .NET.   Functions run as programs inside container environments, and really are no different from other programs in private virtual machines.  But the model is intended to be very lightweight, with millisecond startup delays, and very elastic: "pay for cycles you actually use."

Functions run on very basic VMs, and hence don't have direct access to things like local access to GPU accelerators (one can definitely access a GPU accelerator from a container VM if the system is set up to allow that, but a function "server" wouldn't be configured that way).  Instead, think of a function as the director of an orchestra: it sends various tasks on their merry way, but does little direct work of its own.  So for GPU tasks, a function would typically hand objects like images off to GPU servers that have accelerators attached to the server nodes.  This is a source of delay, but common in today's solutions.

There isn't any special reading for this lecture: it basically continues on a topic we didn't have time to finish (we didn't even really start) on Thursday back one week ago, so we'll continue on the same subject today.

To avoid repetition, I'll run through Microsoft FarmBeats (a digital agriculture application) in the context of Azure IoT Edge, and then that will give us a bunch of example use cases to think about.  If you like, you can Google Microsoft FarmBeats to see some video demos and online materials.

Tue Oct 16 15.  Spark RDDs and file system caching performance.

Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud 2010.

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, OSDI 2008, December 2008. 

Spark is the world champion for "big data" computing, but normally runs in a batch style and is viewed as a kind of back end layer of the cloud, doing big computations offline that you'll later draw on through massive files that represent the output (machine-learned models, precomputed indices, etc).  The RDD model is a cool and widely popular example of a different kind of function PaaS, even though they don't really pitch it that way.  In fact Spark RDDs could be of real interest near the edge, even without MapReduce (RDDs can be used from SciPy, GraphLab, MatLab, Mathmatica...)

Questions to think about: RDDs give Spark a big benefit for Hadoop jobs, but those are used mostly in the back-end of the data center for analytics.  Could there also be an edge opportunity?  What would reuse of RDDs at the edge require?
Thu Oct 18 16.  More on RDD programming

Same papers

There is more to this whole RDD concept than we can cover in one lecture, so we'll continue on the topic and look at some of the complexities of getting good RDD behavior.  It comes down to understanding (more or less) the way that Spark itself really works.
Tue Oct 23 17. The amazing power of GPUs and GPU clusters.  CUDA.  Dandelion: Programming tool for GPU management. Dandelion: a compiler and runtime for heterogeneous systems. Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 49-68. DOI: This topic is kind of a pivot for us.  We'll start to look at hardware accelerators we might want to attach to our edge computing infrastructure.  GPUs are normally programmed using a language called CUDA, but there is a perception that CUDA is a barrier to widespread exploitation of the technology.  Dandelion is one example of a response (not super successful, but very well explained).
Thu Oct 25 18. Challenges of integrating GPUs with other parts of the O/S stack, RDMA, etc.

These three papers look at aspects of a single system being created by Mark Silberstein's group.  No need to read them all, but do spend enough time to know what each is about. 

GPUfs: the case for operating system services on GPUs. Mark Silberstein, Bryan Ford, and Emmett Witchel. 2014. Commun. ACM 57, 12 (November 2014), 68-79. DOI:

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs Shai Bergman, Tanya Brokhman, Tzachi Cohen, Mark Silberstein. USENIX ATC, 2013.

GPUnet: Networking Abstractions for GPU Programs.
Sangman Kim, Seonggu Huh, Yige Hu, Xinya Zhang, and Emmett Witchel, Amir Wated and Mark Silberstein. OSDI 2014.

Once you are building the GPU service itself, you need ways to get to data, and to the network.  Here are a few projects that tackled those topics.
Tue Oct 30 19. Special guest lecture Mark Silberstein is the leader of the group that created GPUfs, SPIN and GPUnet.  Today Mark will give a guest lecture, but in a non-standard room.  The lecture is at the usual time, 1:30pm, but will be in 122 Gates Hall.  Please attend!

Title: An Infrastructure for Inline Acceleration of Network Application

Abstract: With rising datacenter network rates, cloud vendors are deploying FPGA-based SmartNICs to achieve cost-effective acceleration for hypervisor networking tasks and network functions. However, attempts to use SmartNICs’ inline data-processing capabilities for accelerating general purpose server applications in clouds have been limited.  NICA is a hardware-software co-designed framework for inline acceleration of application data plane on FPGA-based SmartNICs in multi-tenant systems. A new ikernel programming abstraction, tightly integrated with the network layer, enables applications to control SmartNIC computations and to activate processing on the application network traffic. NICA’s virtualization architecture supports fine-grain time-sharing of the SmartNIC logic, and I/O path virtualization that together enable cost-effective use of SmartNICs by multiple tenants with strict performance guarantees.

Thu Nov 1 20. Tensor Flow, both as a tool to control GPU or TPU computations, and as a more general programming language for distributed computing.

TensorFlow: A System for Large-Scale Machine Learning
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283.

Tensor flow is portrayed as a distributed systems tool, and it certainly supports useful forms of compositionality.  Nonetheless, it turns out to be primarily used just on a single machine at a time to manage applications that talk to local hardware like GPU clusters or TPU accelerators.  Beyond looking at the work as they present it in this paper and in slides, in this lecture we will discuss the (mixed) experiences people have had with tensor flow  as a true distributed programming tool.

TensorFlow emerged in part because Google has some skepticism about the CUDA + GPUfs + GPUnet concept. 

Mostly, TensorFlow is used on some single computer to control a single attached GPU or TPU cluster.  But it can also support fault tolerant distributed computing, in its own unique style.  Nobody really knows how effective it is in that fancier style of use.
Tue Nov 6 21. Berkeley's recent work on "Ray"

Ray: A Distributed Framework for Emerging AI Applications. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica, UC Berkeley.  OSDI 2018.

The U.C. Berkeley group wasn't convinced that Tensor Flow and RDDs solve every edge computing need.  Ray is their recent proposal for an edge processing language oriented towards AI applications.
Thu Nov 8 22. Routing data through an FPGA: the Catapult model.

A reconfigurable fabric for accelerating large-scale datacenter services.  Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. ISCA 2014. Published as SIGARCH Comput. Archit. News 42, 3 (June 2014), 13-24.

GPU and TPU clusters have the "advantage" of being basically similar to general purpose computers, except for supporting highly parallel operations in hardware (ones matched to the needs of graphics programming, or tensor transformations).  But there are other interesting accelerators, too.

We'll look at FPGA, which is a kind of hardware "filter" and "transformation" unit you can place right on the wire.
Tue Nov 13 23. Clusters of FPGAs and their relevance to ML/AI.  Microsoft's datacenter of FPGAs model.

A cloud-scale acceleration architecture Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. SIGARCH Comput. Archit. News 43, 1 (March 2015), 223-238. DOI:

Journal version of the same paper: here. This has a little more detail and includes some additional experiments, but the shorter conference version is probably fine unless you find something puzzling or incomplete and want to read a little more.

Here is what we get with "grown up" FPGAs, but the topic is fairly complex.  The key idea is that if you have enough FPGAs you can create big clusters that function as powerful hardware supercomputers for certain tasks, like audio (speech) and image (vision).  People have been figuring out how to map deep neural networks into FPGA clusters.

The work is quite technical and we'll sort of skim it, with the goal of just being able to think about what an edge needs to look like if it will use tricks like this for "amazing performance."
Thu Nov 15 24. Software as an out-of-band control plane for data flows.  Barrelfish and Arrakis.  iX.

The multikernel: a new OS architecture for scalable multicore systems.  Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009.  In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA, 29-44. 

Arrakis: The Operating System Is the Control Plane. Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2015. ACM Trans. Comput. Syst. 33, 4, Article 11 (November 2015), 30 pages.

These are two famous operating systems papers that argue for new OS designs aimed at better management of modern hardware.
Tue Nov 20 25. How the HPC people do it: MPI I'm not totally sure about this lecture, but am thinking we might learn about HPC because RDMA grew up as an HPC accelerator.  
(Nov 21-25) Thanksgiving break, no class

Image result for turkey icon

Tue Nov 27 26.  MPI integration with RDMA. If we do go with MPI for lecture 25, this will be a lecture on LibFabrics from the Open Fabrics Alliance.  
Thu Nov 29 27.Software controlled data centers.

The Rise of the Programmable Data Center. Michael Vizard., 2012.

SDDC - software-defined data center. Webopedia article.

Tue Dec 4 28. Identifying open research questions In this last lecture of the semester, we'll cook up some research topics, hopefully one or more per student in the class.  Come with a few ideas.  These aren't topics you'll necessarily work on, but try to think up a topic "in your area".  We'll generate a list, and then will look back over the topics covered this semester to try and tease out risks, to improve the topics or focus them, maybe to refocus some ideas that aren't quite right, etc.  Perhaps a few will become research papers from students who took CS6465! The core of this last meeting of the class will be to think in terms of consumers for each idea.  What is the context for the idea?  Who would read this paper or use this technique, and will the paper actually reach that kind of person?  What perspectives does the work need to emphasize to be successful with that community?