Bringing Time-Sensitivity to Commodity Networks and Public Clouds (via Zoom)

Abstract: Distributed Systems and Packet-Switched Networks were developed in the 1970s under a "clockless design" paradigm.  This was mainly due to the difficulty of accurately synchronizing clocks over jittery packet-switched networks, and it caused a bifurcation whose effects are felt to this day: widely-used "commodity" networks (such as those in public clouds) offer "best effort" service, while networks using specialized hardware and protocols offer "high-performance" or "time-sensitive" services.  
Imagine clocks can be accurately synchronized at scale and at distance without the need for specialized hardware.  What implications would this have for Distributed Systems and Networking?
This talk will describe Huygens, a high-accuracy, software-based network clock synchronization system.  We will provide two examples of how Huygens can be used to transform jittery and unpredictable public cloud infrastructure into deterministic, time-sensitive systems. (1) Financial Trading. Exchanges are required to provide "fair access": a trader's order cannot be overtaken by another trader's order and all traders should receive market data at the same time.  Currently, exchanges meet these requirements using carefully-designed, bespoke data networks, limiting their scale and scope.  We show how accurate clock synchronization can be used to build fair financial exchanges on top of public clouds.  (2) "Zero-drop" Networks. Packet drops are intrinsic to the operation of packet switched networks with specialized hardware and protocols (e.g., the IEEE 802.1Qbb priority-flow control standard) being used to prevent packet drops.  Such a drop-free property is required for storage networks built on top of RDMA (Remote Direct Memory Access) technology and networks which are used for running large AI/ML workloads.  We describe a new protocol, called On-Ramp, which uses accurate clock synchronization to quickly detect path congestion and pause traffic at the network's edge.  We show how On-Ramp can provide near-zero packet losses.  We conclude by mentioning a few other examples where time-sensitivity and "timeliness" play a critical role.  

Bio: Balaji Prabhakar is VMWare Founders Professor of Computer Science and a faculty member in the Departments of Electrical Engineering and Computer Science, and, by courtesy, in the Graduate School of Business at Stanford University. His research interests are in computer networks; notably, in Data Center Networks and Cloud Computing Platforms. His work spans network algorithms, congestion control protocols, and stochastic network theory.  He has also worked on Societal Networks, where he has developed "nudge engines" to incentivize commuters to travel in off-peak times so that congestion, fuel and pollution costs are reduced.
Balaji has been a Terman Fellow at Stanford University, and a Fellow of the Alfred P. Sloan Foundation, IEEE and ACM. He has received the NSF CAREER Award, the Erlang Prize from the INFORMS Applied Probability Society, the Rollo Davidson Prize given to young Statisticians and Probabilists, and delivered the Lunteren Lectures of the Dutch Operations Research Society. He is the inaugural recipient of the IEEE Innovation in Societal Infrastructure Award which recognizes "significant technological achievements and contributions to the establishment, development and proliferation of innovative societal infrastructure systems." He has received the IEEE Koji Kobayashi Award for his work on Computer Networks and the ACM Sigmetrics Award for his work on Stochastic Network Theory.  He is a co-recipient of a few best paper and test of time awards.  In 2005--07 he was Switch Architect at Nuova Systems (acquired by Cisco Systems in 2008) where he developed the fabric scheduling and line card algorithms of Cisco's Nexus 5000 family of data center Ethernet switches.  In 2011 he co-founded Urban Engines (acquired by Google in 2016) and is currently on leave at where he is co-founder and CEO.