CS6465: Emerging Cloud Technologies and Systems Challenges
Hollister Hall Room 320, Tuesday/Thursday 1:25-2:40
CS6465 is a PhD-level class in systems that tracks emerging cloud computing technology, opportunities and challenges. It is unusual among CS graduate classes: the course is aimed at a small group of students, uses a discussion oriented style, and the main "topic" is actually an unsolved problem in computer systems. The intent is to think about how one might reduce that open problem to subproblems, learn about prior work on those, and extract exciting research questions. The PhD focus centers on that last agenda element.
In this second offering, we plan to focus on issues raised by moving machine learning to the edge of the cloud. In this phrasing, edge computing still occurs within the data center, but for reasons of rapid response, involves smart functionality close to the client, under time pressure. So you would think of an AI or ML algorithm written in as standard a way as possible (perhaps, Tensor Flow, or Spark/Databricks using Hadoop, etc). But whereas normally that sort of code runs deep in the cloud, many minutes or hours from when data is acquired, the goal now is to keep the code unchanged (or minimally changed) and be able to run on the stream of data as it flows into the system, milliseconds after it was acquired. We might also push aspects of machine learned behavior right out to the sensors.
This idea is a big new thing in cloud settings -- they call it "edge" computing or "intelligent" real-time behavior. But today edge computing often requires totally different programming styles than back-end computing. Our angle in cs6465 is really to try and understand why this is so: could we more or less "migrate" code from the back-end to the edge? What edge functionality would this require? Or is there some inherent reason that the techniques used in the back-end platforms simply can't be used in the edge, even with some sort of smart tool trying to help.
The goal of this focus on an intelligent edge is, of course, to motivate research on the topic. As a systems person, Ken's group is thinking about how to build new infrastructure tools for the intelligent edge. Those tools could be the basis of great research papers and might have real impact. But others work in this area too, and we'll want to read papers they have written.
Gaps can arise at other layers too. For example, Tensor Flow is hugely popular at Google in the AI/ML areas, and Spark/Databricks plus Hadoop (plus Kafka, Hive, HBase, Zookeeper, not to mention plus MatLab, SciPy, Graphlab, Pregle, and a gazillion other tools) are insanely widely used. So if we assume that someone is a wizard at solving AI/ML problems using this standard infrastructure, but now wants parts of their code to work on an intelligent edge, what exactly would be needed to make that possible? Perhaps we would need some new knowledge representation, or at least some new way of storing knowledge, indexing it, and searching for it. This would then point to opportunities for research at the AI/ML level as well as opportunities in databases or systems to support those new models of computing.
CS6465 runs as a mix of discussions and short mini-lectures (mostly by the professor), with some small take-home topics that might require a little bit of out-of-class research, thinking and writing. Tthere won't be a required project, or any exams, and the amount of written material required will be small, perhaps a few pages to hand in per week. Grading will mostly be based on in-class participation.
CS6465 can satisfy the same CS graduate requirements (in the systems area) as any other CS6xxx course we offer. Pick the course closest to your interests, no matter what you may have heard. CS6410 has no special status.
Schedule and Readings/Slides
The following schedule is just a conceptual overview. It still has more blank slots than actual plan and the plan itself will evolve.
We will be reading a lot of papers from the main conferences, but part of the puzzle here is to figure out which papers are relevant to our topic. So this may look like a series of lectures by Ken on standard cloud stuff, but in practice we will often be discussing one or more published papers germane to our main topic. The class will work as a team to identify those papers -- a good experience for later in your studies when you will be doing literature searches on your own. So at least some classes will devote a fair amount of time to discussing what papers we need to read, and everyone will need to find an example or two and make the case for looking at it.
Which conferences are the ones most relevant to "intelligent edge" computing? This is a curious question too. Offhand, "intelligent computing" leads to NIPS and KDD. But "edge computing" would focus us more on SOCC, NSDI, SOSP, OSDI. Then there are networking conferences like SIGCOMM, and "broad agenda" conferences like DSN, ICDCS, Eurosys, ATC, LADIS (some of these are ACM conferences, some are from IEEE, and a few are from USENIX). The data conferences could have relevant papers too: VLDB, SIGMOD, and the same with the real-time conferences, like RTSS. So there are a lot of "candidate" conferences. We'll probably focus mostly on papers that appeared in the past five years. There may be some interesting papers in journals too: TOCS, TOPLAS... Still, five years times perhaps 15 conferences and perhaps a further 5 journals would give us maybe 100 "venues" to scan, with maybe an average of 20 papers each per year, hence 2000 or so candidate papers. We only really plan to read one or two per lecture.
The biggest issues for the edge, as opposed to a normal backend cloud, is that:
Then we have a different kind of issue to think about:
Even this list betrays a bias: as a "platform" person, Ken's bias is a little bit towards systems. The machine learning applications that run on those systems are important -- the client generates the workload. But even so, we want to think of our client applications in pretty general, black-box terms. The other puzzle is that while we know a lot about the successful back-end ecosystems (we'll focus on Apache in CS6465 but in fact Amazon, Azure, Google and others all have elaborate specialized ones), this concept of a smart edge is nascent and hence there is little detail because it has yet to be invented. Figuring out what the technology "roles" will be is a good place to start, and then we can ask what candidates exist for populating those roles.