Cornell Systems Lunch

CS 7490 Fall 2017
Friday 12PM, Gates 114

Emin Gun Sirer and Robbert van Renesse


Sponsored by

The Systems Lunch is a seminar for discussing recent, interesting papers in the systems area, broadly defined to span operating systems, distributed systems, networking, architecture, databases, and programming languages. The goal is to foster technical discussions among the Cornell systems research community. We meet once a week on Fridays at noon in Gates 114.

The systems lunch is open to all Cornell Ph.D. students interested in systems. First-year graduate students are especially welcome. Non-Ph.D. students have to obtain permission from the instructor. Student participants are expected to sign up for CS 7490, Systems Research Seminar, for one credit.

To join the systems lunch mailing list please send an empty message to cs-systems-lunch-l-request@cornell.edu with the subject line "join". More detailed instructions can be found here.

Links to papers and abstracts below are unlikely to work outside the Cornell CS firewall. If you have trouble viewing them, this is the likely cause.

Date Paper Presenter
August 25 The Stellar Consensus Protocol: A Federated Model for Internet-level Consensus
David Mazieres
Isaac Sheff
September 1 REM: Resource-Efficient Mining for Blockchains
Fan Zhang, Ittay Eyal, Robert Escriva, Aria Juels, and Robbert van Renesse
Usenix Security 2017
Fan Zhang
September 8 Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding
SIGMOD 2016
Ayush Dubey
September 15 What (New) Bugs Live in the Cloud?

As more data and computation move from local to cloud environments, datacenter distributed systems have become a dominant backbone for many modern applications. However, the complexity of cloud-scale hardware and software ecosystems has outpaced existing testing, debugging, and verification tools.

I will describe three new classes of bugs that often appear in large-scale datacenter distributed systems: (1) distributed concurrency bugs, caused by non-deterministic timings of distributed events such as message arrivals as well as multiple crashes and reboots; (2) limpware-induced performance bugs, design bugs that surface in the presence of "limping" hardware and cause cascades of performance failures; and (3) scalability bugs, latent bugs that are scale dependent, typically only surface in large-scale deployments (100+ nodes) but not necessarily in small/medium-scale deployments.

The findings above are based on our long, large-scale cloud bug study (3000+ bugs) and cloud outage study (500+ outages). I will present some of our work in understanding and combating distributed concurrency bugs, mainly focusing on our semantic-aware implementation-level model checking (SAMC) and taxonomy of distributed concurrency bugs (TaxDC). If time permits, I will also briefly discuss limpware and scalability bugs.


Haryadi Gunawi is a Neubauer Family Assistant Professor in the Department of Computer Science at the University of Chicago where he leads the UCARE research group (UChicago systems research on Availability, Reliability, and Efficiency). He received his Ph.D. in Computer Science from the University of Wisconsin, Madison in 2009. He was a postdoctoral fellow at the University of California, Berkeley from 2010 to 2012. His current research focuses on cloud computing reliability and new storage technology. He has won numerous awards including NSF CAREER award, NSF Computing Innovation Fellowship, Google Faculty Research Award, NetApp Faculty Fellowships, and Honorable Mention for the 2009 ACM Doctoral Dissertation Award.

Haryadi Gunawi (University of Chicago)
September 22 vCorfu: A Cloud-Scale Object Store on a Shared Log
Michael Wei, University of California, San Diego, and VMware Research Group; Amy Tai, Princeton University and VMware Research Group; Christopher J. Rossbach, The University of Texas at Austin and VMware Research Group; Ittai Abraham, VMware Research Group; Maithem Munshed, Medhavi Dhawan, and Jim Stabile, VMware; Udi Wieder and Scott Fritchie, VMware Research Group; Steven Swanson, University of California, San Diego; Michael J. Freedman, Princeton University; Dahlia Malkhi, VMware Research Group
NSDI 2017
Youer Pu
September 29 ViewMap: Sharing Private In-Vehicle Dashcam Videos
Minho Kim, Jaemin Lim, Hyunwoo Yu, Kiyeon Kim, Younghoon Kim, and Suk-Bok Lee, Hanyang University
NSDI 2017
Edward Tremel
October 6 TensorFlow: A System for Large-Scale Machine Learning
Martín Abadi et al.
OSDI 2016
Matthew Milano
October 13 Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data
Yunhao Zhang, Rong Chen, Haibo Chen (Shanghai Jiao Tong University)
SOSP 2017
Yunhao Zhang
October 20 Building Automation Systems for the Enterprise

Despite advances in computer science, your typical large-scale enterprise company still runs primarily on "carbon" -- large numbers of human workers running the core business functions. This carbon workforce consists of millions of people worldwide performing manual, repetitive, and (nearly) deterministic tasks on a daily basis that in many cases a computer is much better equipped to perform.

In this talk, we will first discuss why enterprise companies (even "tech giants" in the Fortune 50) are still so much behind the technology curve. And with this understanding, how advances in computer science can help companies catch up by shifting this work to "silicon." Next, will describe technical and systems challenges that arise when building complex automation systems that are deployed in client environments,


Soroco

George Nychis (Soroco)
October 27 Programmable Topologies

Fiber optic cables are the workhorses of today’s Internet services. Operators spend millions of dollars to purchase, lease and maintain their optical backbone, making the efficiency of fiber essential to their business. In this talk, I will make a case for programmable topologies. ProjecToR [SIGCOMM’16] is a programmable data center interconnect that uses free-space optics between racks. Our design enables all rack-pairs to communicate via direct links. We use a digital micromirror device (DMD) and mirror assembly combination as a transmitter and a photodetector on top of the rack as a receiver. We built a prototype that points to the feasibility of our approach. Simulations and analysis show that, for realistic data center workloads, it can improve mean flow completion time by 30-95%, while reducing cost by 25-40%. Next, I will present the results of the first ever large scale study on performance of optical links in a backbone network carrying live traffic [HotNets’17]. Our data-driven analysis coupled with simulations showed that existing fiber deployment can be driven towards much greater efficiency by enabling programmable modulations. For example, 99% of Microsoft’s 100 Gbps channels can be augmented to 150 Gbps, by simply changing the modulation formats at the two ends without touching the fiber or intermediate amplifiers. Even better, 43% can double their capacity and carry up to 200 Gbps. This way, using the same fiber paths, we get more bits, less space, and less power. This project has moved the industry into adopting bandwidth variable transponders in the WAN.


Monia Ghobadi is a researcher at Microsoft Research, Redmond, WA. Her research interests include all aspects of networked systems. Currently, she leads the optical networking research in Redmond lab. Her past work spans data center congestion control, RDMA, software-defined networks, and network measurement. This year, she was recognized as the N2women rising stars n networking and communications. She received her Ph.D. from the University of Toronto and worked at Google’s data center team before joining Microsoft Research. Many of the technologies that she has helped develop are part of real-world systems at Microsoft and Google. Her papers have won best dataset award (IMC 2016), Google research excellent paper award (USENIX ATC 2012), and best paper award (IMC 2008).

Monia Ghobadi
November 3 Why Your Encrypted Database Is Not Secure
Paul Grubbs, Thomas Ristenpart, Vitaly Shmatikov
HotOS 2017
Paul Grubbs
November 10 Timely, Reliable, and Cost-Effective Internet Transport Service using Dissemination Graphs
Amyh Babay, Emily Wagner, Michael Dinitz, and Yair Amir
ICDCS 2017
Amy Babay (JHU)
November 17 ACSU Luncheon, no meeting.
November 24 Thanksgiving Break, no meeting.
December 1 Programming with People

Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power on an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.

This talk presents AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction allows AutoMan programmers to focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

Emery Berger (UMass)