Cornell Systems Lunch
CS 7490 Spring 2021
The Systems Lunch is a seminar for discussing recent, interesting papers in the systems area, broadly defined to span operating systems, distributed systems, networking, architecture, databases, and programming languages. The goal is to foster technical discussions among the Cornell systems research community. We meet once a week on Fridays at 11:40 on-line by Zoom only.
The systems lunch is open to all Cornell Ph.D. students interested in systems. First-year graduate students are especially welcome. Non-Ph.D. students have to obtain permission from the instructor. Student participants are expected to sign up for CS 7490, Systems Research Seminar, for one credit.
Links to papers and abstracts below are unlikely to work outside the Cornell CS firewall. If you have trouble viewing them, this is the likely cause.
|February 12||Serving DNNs like Clockwork: Performance Predictability from the Bottom Up (video)
Arpan Gujarati, Max Planck Institute for Software Systems; Reza Karimi, Emory University; Safya Alzayat, Wei Hao, and Antoine Kaufmann, Max Planck Institute for Software Systems; Ymir Vigfusson, Emory University; Jonathan Mace, Max Planck Institute for Software Systems
|Ymir Vigfusson (Emory)|
|February 19||Building Storage Systems for New Applications and New Hardware (video)
Abstract: The modern storage landscape is changing at an exciting rate. New technologies, such as Intel DC Persistent Memory, are being introduced. At the same time, new applications such as blockchain are emerging with new requirements from the storage subsystem. New regulations, such as the General Data Protection Regulation (GDPR), place new constraints on how data may be read and written. As a result, designing storage systems that satisfy these constraints is interesting and challenging. In this talk, I will describe the lessons we learnt from tackling this challenge in various forms: my group has built file systems and concurrent data structures for persistent memory, storage solutions for blockchains and machine learning, and analyzed how the GDPR regulation affects storage systems.
Bio: Vijay Chidambaram is an Assistant Professor in the Computer Science department at the University of Texas at Austin. He did his post-doc at the VMware Research Group, and got his PhD with Prof. Remzi and Andrea Arpaci-Dusseau at the University of Wisconsin-Madison. His papers have won Best Paper Awards in ATC 2018, FAST 2018, and FAST 2017. He was awarded the NSF CAREER Award in 2018, SIGOPS Dennis M. Ritchie Dissertation Award in 2016, and the Microsoft Research Fellowship in 2014. Techniques from his work have been incorporated into commercial products, and his work has helped make the Linux kernel more reliable.
|February 26||Byzantine Ordered Consensus without Byzantine Oligarchy (video)
Yunhao Zhang, Cornell University; Srinath Setty, Qi Chen, and Lidong Zhou, Microsoft Research; Lorenzo Alvisi, Cornell University
|March 5||No lecture. Discuss re-occupancy of systems lab.
|March 12||Meerkat: Scalable Replicated Transactions Following the Zero-Coordination Principle (video)
Adriana Szekeres, Michael Whittaker, Naveen Kr. Sharma, Jialin Li, Arvind Krishnamurthy, Dan Ports, Irene Zhang
|March 19||Towards a User-Defined and Truly Serverless Cloud (video)
Since the launch of Amazon Web Service in 2006, cloud computing has gone through several paradigm shifts, from a niche market that rents physical machines to the biggest IT sector that sells a variety of managed services. Most recently, serverless computing, a paradigm that promises to relieve users from the IT burden of managing servers, has quickly gained its popularity. Despite the tremendous development in cloud services, the underlying data-center infrastructure is largely the same as 15 years ago and as non-cloud environments: network-connected servers each equipped with some processor, memory, and storage. Is the server-based infrastructure the best fit for cloud computing? Going forward, what should future cloud computing look like and what data-center infrastructure it should run on?
This talk will try to answer these questions by presenting my lab‘s past few years of efforts in building a truly serverless cloud. Such a cloud runs on a "disaggregate" data-center infrastructure, which breaks monolithic servers into network-attached hardware devices and forms resource pools by logically combining devices of the same type. Hardware resources can be allocated and scaled at fine granularity, and resource pools can be individually managed and customized for different application needs. On top of this infrastructure, the truly serverless cloud allows users to run their code with "unlimited" resources of arbitrary types and pay for only what their code uses. Specifically, this talk will cover a new OS and a new hardware platform that we built for a disaggregated data center. I will also demonstrate how "serverless" cloud services can run in and benefit from such a data center. Finally, I will briefly discuss our vision of a future "user-defined cloud", one where users define their own cloud services, by defining the hardware resource needs, system software features, and security requirements of their applications, and doing so without the need to build or manage low-level systems.
|Yiying Zhang (UCSD)|
|March 26||Memory Disaggregation: Think Outside the Box (video)
Data growth has turned memory into a major bottleneck. To cope with this bottleneck, memory-hungry applications are increasingly taking advantage of memory "outside the box" -- leveraging advances in memory technologies and networking to increase their effective memory capacity. Disaggregated, or "far", memory is attached to the network and can be accessed by remote processors without mediation from a local processor. Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources, and an independent failure model that leaves data resident in the disaggregated memory unaffected by compute node failures. Some hurdles remain, however: although network bandwidth is improving, network latency still dominates the latency of disaggregated memory, leading to challenges to achieve good performance. In this talk, I will review the trends that motivate disaggregated memory and outline the benefits that disaggregated memory architectures provide for applications. I will provide highlights from recent work to manage disaggregated memory-resident data in a performant fashion, through clever data structure design, intelligent caching, and efficient sharing, potentially leveraging minimal memory-side acceleration. I will also discuss how disaggregated memory can be exploited to improve application fault tolerance, by borrowing and adapting ideas from task-based programming models, concurrent programming techniques, and lock-free data structures. I will conclude by outlining some open challenges, going forward.
BIO: Dr. Kimberly Keeton is a former Distinguished Technologist at Hewlett Packard Labs. Her recent research focuses on data management for disaggregated persistent memory architectures. She has also worked in the areas of storage and information management, NoSQL databases, intelligent storage, and workload characterization. Her work has led to over 60 publications in refereed journals and conferences and 22 granted US patents, and has contributed to multiple products. She was a co-architect of the Express Query database, which provides metadata services for HPE‘s StoreAll archiving solution. She has served as Technical Program Committee (PC) Chair for multiple top-tier research conferences on computer systems and storage, including USENIX Symposium on Operating Systems Design and Implementation (OSDI), ACM SIGMETRICS, ACM European Systems Conference (EuroSys), USENIX Conference on File and Storage Technologies (FAST), and IEEE/IFIP Dependable Systems and Networks Performance and Dependability Symposium (DSN/PDS). She acts as an industrial advisor to university research groups at Carnegie Mellon, ETH-Zurich, and the University of California at Berkeley. Kim received her PhD and MS in Computer Science from UC Berkeley, and her BS in Computer Engineering and Engineering and Public Policy from Carnegie Mellon. She is a Fellow of the ACM and the IEEE.
|Kim Keeton (HPE)|
|April 2||Scaling Community Cellular Networks with CommunityCellularManager (video)
Shaddi Hasan, UC Berkeley; Mary Claire Barela, University of the Philippines, Diliman; Matthew Johnson, University of Washington; Eric Brewer, UC Berkeley; Kurtis Heimerl, University of Washington
|April 9||The Facebook Delos Storage System (video)
Delos is a storage system at the bottom of the Facebook stack, operating as the backing store for the Twine scheduler and processing more than 2B TXes/day. Delos has a shared log design (based on Corfu), separating the consensus protocol from the database via a shared log API. In this talk, I’ll describe the two ways in which Delos advances the state of the art for replicated systems. First, we virtualize consensus by virtualizing the shared log: as a result, the system can switch between entirely different log implementations (i.e., consensus protocols) without downtime (as described in our OSDI 2020 paper). Second, we virtualize the replicated state machine above the shared log, splitting the logic of the database into separate, stackable layers called log-structured protocols (this is ongoing work). We leveraged virtualization in Delos to implement and deploy two different databases in production (a Table store and a ZooKeeper clone) on a common codebase and operational platform. Virtualization also enabled higher reliability, allowing individual components within Delos to be upgraded without downtime or complex migration logic.
|Mahesh Balakrishnan (Facebook)|
|April 23||Wellness Day -- no systems lunch, no meeting.|
|April 30||ACSU Luncheon -- no systems lunch, no meeting.|
|May 7||Title: CrocodileDB: Towards Resource-Efficient Execution by Exploiting Time Slackness
A critical challenge in providing query service in the cloud is improving resource efficiency that demands database systems to minimize resource consumption (and subsequently users’ monetary cost) while meeting users’ performance goals. Existing databases for maintaining standing queries (e.g., maintaining a dashboard report over a data stream) either significantly reduce query latency by eagerly consuming all available resources (e.g., stream processing), or minimize resource consumption by lazily processing new data but sacrificing query latencies (e.g. batch processing). Existing eager and lazy execution approaches are optimized for the applications on the two ends of the resource-latency trade-off, but the middle ground between the two is rarely exploited. We find many applications lie in the middle ground, where users allow a time slackness to reduce resource consumption (i.e., dollar cost) but still want to see timely results. We propose CrocodileDB, a new database architecture that exploits this time slackness information to reduce resource consumption and allows users to tune this slackness to adjust query latencies and resource consumption for their applications.
Bio: Dixin Tang is currently a postdoc researcher in the Data Systems and Foundations group of UC Berkeley, working with Prof. Aditya Parameswaran. Before joining UC Berkeley, he obtained his Ph.D. degree at the University of Chicago, advised by Prof. Aaron J. Elmore. His research interest includes interactive data analysis, query optimization for resource-efficient query execution, and providing query service in the cloud. He is currently leading Dataspread, a data analysis tool that combines the intuitiveness and flexibility of spreadsheets with the scalability and power of databases. At UChicago, he works on CrocodileDB, a new database architecture that unifies batch processing and continuous query processing, and provides resource-efficient query service in the cloud.