Cornell Systems Lunch
CS 7490 Fall 2017
The Systems Lunch is a seminar for discussing recent, interesting papers in the systems area, broadly defined to span operating systems, distributed systems, networking, architecture, databases, and programming languages. The goal is to foster technical discussions among the Cornell systems research community. We meet once a week on Fridays at noon in Gates 114.
The systems lunch is open to all Cornell Ph.D. students interested in systems. First-year graduate students are especially welcome. Non-Ph.D. students have to obtain permission from the instructor. Student participants are expected to sign up for CS 7490, Systems Research Seminar, for one credit.
Links to papers and abstracts below are unlikely to work outside the Cornell CS firewall. If you have trouble viewing them, this is the likely cause.
|August 25||The Stellar Consensus Protocol: A Federated Model for Internet-level Consensus
|September 1||REM: Resource-Efficient Mining for Blockchains
Fan Zhang, Ittay Eyal, Robert Escriva, Aria Juels, and Robbert van Renesse
Usenix Security 2017
|September 8||Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding
|September 15||What (New) Bugs Live in the Cloud?
As more data and computation move from local to cloud environments, datacenter distributed systems have become a dominant backbone for many modern applications. However, the complexity of cloud-scale hardware and software ecosystems has outpaced existing testing, debugging, and verification tools.
I will describe three new classes of bugs that often appear in large-scale datacenter distributed systems: (1) distributed concurrency bugs, caused by non-deterministic timings of distributed events such as message arrivals as well as multiple crashes and reboots; (2) limpware-induced performance bugs, design bugs that surface in the presence of "limping" hardware and cause cascades of performance failures; and (3) scalability bugs, latent bugs that are scale dependent, typically only surface in large-scale deployments (100+ nodes) but not necessarily in small/medium-scale deployments.
The findings above are based on our long, large-scale cloud bug study (3000+ bugs) and cloud outage study (500+ outages). I will present some of our work in understanding and combating distributed concurrency bugs, mainly focusing on our semantic-aware implementation-level model checking (SAMC) and taxonomy of distributed concurrency bugs (TaxDC). If time permits, I will also briefly discuss limpware and scalability bugs.
Haryadi Gunawi is a Neubauer Family Assistant Professor in the Department of Computer Science at the University of Chicago where he leads the UCARE research group (UChicago systems research on Availability, Reliability, and Efficiency). He received his Ph.D. in Computer Science from the University of Wisconsin, Madison in 2009. He was a postdoctoral fellow at the University of California, Berkeley from 2010 to 2012. His current research focuses on cloud computing reliability and new storage technology. He has won numerous awards including NSF CAREER award, NSF Computing Innovation Fellowship, Google Faculty Research Award, NetApp Faculty Fellowships, and Honorable Mention for the 2009 ACM Doctoral Dissertation Award.
|Haryadi Gunawi (University of Chicago)|
|September 22||vCorfu: A Cloud-Scale Object Store on a Shared Log
Michael Wei, University of California, San Diego, and VMware Research Group; Amy Tai, Princeton University and VMware Research Group; Christopher J. Rossbach, The University of Texas at Austin and VMware Research Group; Ittai Abraham, VMware Research Group; Maithem Munshed, Medhavi Dhawan, and Jim Stabile, VMware; Udi Wieder and Scott Fritchie, VMware Research Group; Steven Swanson, University of California, San Diego; Michael J. Freedman, Princeton University; Dahlia Malkhi, VMware Research Group
|September 29||ViewMap: Sharing Private In-Vehicle Dashcam Videos
Minho Kim, Jaemin Lim, Hyunwoo Yu, Kiyeon Kim, Younghoon Kim, and Suk-Bok Lee, Hanyang University
|October 6||TensorFlow: A System for Large-Scale Machine Learning
Martín Abadi et al.
|October 13||Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data
Yunhao Zhang, Rong Chen, Haibo Chen (Shanghai Jiao Tong University)
|October 20||Building Automation Systems for the Enterprise
Despite advances in computer science, your typical large-scale enterprise company still runs primarily on "carbon" -- large numbers of human workers running the core business functions. This carbon workforce consists of millions of people worldwide performing manual, repetitive, and (nearly) deterministic tasks on a daily basis that in many cases a computer is much better equipped to perform.
In this talk, we will first discuss why enterprise companies (even "tech giants" in the Fortune 50) are still so much behind the technology curve. And with this understanding, how advances in computer science can help companies catch up by shifting this work to "silicon." Next, will describe technical and systems challenges that arise when building complex automation systems that are deployed in client environments,
|George Nychis (Soroco)|
|October 27||Programmable Topologies
Fiber optic cables are the workhorses of today’s Internet services. Operators spend millions of dollars to purchase, lease and maintain their optical backbone, making the efficiency of fiber essential to their business. In this talk, I will make a case for programmable topologies. ProjecToR [SIGCOMM’16] is a programmable data center interconnect that uses free-space optics between racks. Our design enables all rack-pairs to communicate via direct links. We use a digital micromirror device (DMD) and mirror assembly combination as a transmitter and a photodetector on top of the rack as a receiver. We built a prototype that points to the feasibility of our approach. Simulations and analysis show that, for realistic data center workloads, it can improve mean flow completion time by 30-95%, while reducing cost by 25-40%. Next, I will present the results of the first ever large scale study on performance of optical links in a backbone network carrying live traffic [HotNets’17]. Our data-driven analysis coupled with simulations showed that existing fiber deployment can be driven towards much greater efficiency by enabling programmable modulations. For example, 99% of Microsoft’s 100 Gbps channels can be augmented to 150 Gbps, by simply changing the modulation formats at the two ends without touching the fiber or intermediate amplifiers. Even better, 43% can double their capacity and carry up to 200 Gbps. This way, using the same fiber paths, we get more bits, less space, and less power. This project has moved the industry into adopting bandwidth variable transponders in the WAN.
Monia Ghobadi is a researcher at Microsoft Research, Redmond, WA. Her research interests include all aspects of networked systems. Currently, she leads the optical networking research in Redmond lab. Her past work spans data center congestion control, RDMA, software-defined networks, and network measurement. This year, she was recognized as the N2women rising stars n networking and communications. She received her Ph.D. from the University of Toronto and worked at Google’s data center team before joining Microsoft Research. Many of the technologies that she has helped develop are part of real-world systems at Microsoft and Google. Her papers have won best dataset award (IMC 2016), Google research excellent paper award (USENIX ATC 2012), and best paper award (IMC 2008).
|November 3||Why Your Encrypted Database Is Not Secure
Paul Grubbs, Thomas Ristenpart, Vitaly Shmatikov
|November 10||Timely, Reliable, and Cost-Effective Internet Transport Service using Dissemination Graphs
Amyh Babay, Emily Wagner, Michael Dinitz, and Yair Amir
|Amy Babay (JHU)|
|November 17||ACSU Luncheon, no meeting.|
|November 24||Thanksgiving Break, no meeting.|
|December 1||Programming with People
Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power on an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.
This talk presents AutoMan, the first fully automatic crowdprogramming
system. AutoMan integrates human-based computations into a standard
programming language as ordinary function calls, which can be
intermixed freely with traditional functions. This abstraction allows
AutoMan programmers to focus on their programming logic. An AutoMan
program specifies a confidence level for the overall computation and a
budget. The AutoMan runtime system then transparently manages all
details necessary for scheduling, pricing, and quality
control. AutoMan automatically schedules human tasks for each
computation until it achieves the desired confidence level; monitors,
reprices, and restarts human tasks as necessary; and maximizes
parallelism across human workers while staying under budget.
|Emery Berger (UMass)|