Kenneth P. Birman
Our main focus is on a new system called “Spinglass”
(http://www.cs.cornell.edu/Info/Projects/Spinglass). The idea is to explore a class of reliable multicast protocols that are extremely scalable, and provide unusually stable throughput under stress. We believe that stable throughput is a common requirement in demanding critical settings, but few reliable protocols have this property. By scalability, we mean that a system that works with ten computers should also be usable with ten thousand of them.
The Bimodal Multicast protocol underlying Spinglass appears to be a real step forward, because it not only provides this guarantee, but also has a unique form of reliability – a bimodal probabilistic delivery distribution, which can be formally derived and has been confirmed in practice. We see a tremendous number of fascinating challenges as we extend this into a full-scale system and develop tools to program with bimodal reliability. The work involves our whole group. Graduate students looking at Bimodal Multicast and the Spinglass per-se system include Zhen Xiao, who is focusing on wide-area networks, Indranil Gupta, who is looking at algorithms that make direct use of probabilistic guarantees, and Ranveer Chandra and Venugopalen Ramasubramanian, who are looking at ad-hoc mobile networking. Kate Jenkins and Ken Hopkinson are looking at applications that arise when using these protocols in real-world settings arising from the restructuring of the electric power grid.
In closely related work, my colleague Robbert van Renesse is exploring methods for building scalable databases of network management information in the context of a system he called Astrolabe. He also uses protocols based on “gossip,” the technique employed in Bimodal Multicast, but his work is optimized for many-to-many communication. Robbert and I are starting to collaborate with Al Demers on problems that relate Astrolabe to consistency issues seen in traditional database systems.
A second focus involves developing formal methods to prove properties of reliable communication protocols, such as those used in Isis, Horus, and Ensemble. These employ a reliability model called virtual synchrony, which provides a mixture of guarantees involving the membership of groups of processes in a dynamic distributed setting, handling of failures and recoveries, multicast to group members, and tools for initializing a joining process. Our work is joint with Robert Constable’s Nuprl project and Nancy Lynch’s group at MIT, and uses Lynch’s I/O Automata to formalize the behavior of Ensemble, after which Nuprl can be exploited to reason about and manipulate complex protocol stacks. R. van Renesse and graduate student Xioaming Liu are leading this activity.
Our third focus is on ways of embedding reliability tools into widely used software settings. Werner Vogels is pursuing this in a component-based system he calls Quintet. The emphasis of Quintet is on developing very scalable cluster solutions for scalable, high performance, fault-tolerant servers. R. van Renesse is doing something similar, but his focus is on system management issues, using a system he called the RMIB. Post-doctoral researcher Raoul Bhoedjang is developing an intrusion detection system based on Spinglass, while Ben Atkin is exploring mobile and nomadic applications. Our hope through these efforts is to understand the needs better, looking at real and important technology areas, to show how such problems can be solved, and to demonstrate the solutions in settings of commercial importance.
Finally, Tibor Janosi, Rimon Barr, and Adrian Bozdog are developing a reliable and scalable architecture for applications that would run on scalable cluster-style servers, in work supported by NASA JPL. A. Bozdog and R. Barr are looking at the kinds of applications NASA might run on clusters, while T. Janosi has a particular interest in financial and brokerage systems. Our effort will explore architecture and systems challenges in making such systems scale to support really large numbers of clients. Not only is the problem itself interesting, we hope to gain insight into how our new tools can be exploited in leading-edge application settings.
Very recently, we’ve teamed up with other researchers in computer science to create a group that will look at mobile and ad-hoc computer networks. We believe that the problems arising in such settings are natural fits with our work on Spinglass.
Our project is funded primarily by DARPA, with some additional funding from the Electric Power Research Institute and Microsoft Research. The project is directed by myself, R. van Renesse, and W. Vogels. R. Bhoedjang is visiting as a post-doc for a few years, and developing Intrusion Detection software to make use of Astrolabe.
Stephen ’57 and Marilyn Miles Excellence in Teaching Award, 2000.
Director: Graduate Studies, Computer Science.
Committees: Founding Committee for Faculty of Computing and Information Science; University Conflicts of Interest; Engineering College Policy.
Academic Leadership Series.
Bimodal Multicast. Invited lecture. 1st Workshop on Networked Group Communications, Pisa, Italy, November 1999.
17th Annual Brazilian Symposium on Computer Networks, Bimodal Multicast. Carnegie Mellon University, December 1999.
The Horus and Ensemble Projects: Accomplishments and Limitations. DISCEX ’99, Hilton Head, SC, January 2000.
—. Bimodal Multicast. Invited lecture. IBM, Westchester, NY, January 2000. The Next Generation Internet: Unsafe at any speed? Invited lecture. University of Texas at Austin, February 2000.
—. NMADS Conference, New York, NY, March 2000.
—. IBM, Westchester, NY, May 2000.
Keynote Speaker, Middleware 2000, NY, June 2000.
“Bimodal Multicast.” ACM Transactions on Computer Systems 17(2) (May 1999), 41-88 (with M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu and Y. Minsky).
“Optimized Group Rekey for Group Communication Systems.” Network and Distributed System Security 2000, San Diego, CA (February 2000) (with O. Rodeh and D. Dolev). (Extended version available as Cornell University, Computer Science TR99-1764.)
“A Probabilistically Correct Leadership Election Protocol for Large Groups.” To Appear, DISC-2000, Nov 2000, Toledo Spain (with I. Gupta and R. van Renesse).
“A Study of Group Rekeying.” Cornell University, Computer Science TR2000-1791, submitted to OSDI2000 (March 2000) (with D. Dolev and O. Rodeh).
“Agent Technology Applied to Adaptive Relay Settings for Multi-Terminal Lines.” Cornell University, Computer Science TR 2000-1792 (March 2000), submitted to the IEEE Summer Power Conference (with D.V. Coury, J.S. Thorp, and K.M. Hopkinson).
“Technology Challenges for Virtual Overlay Networks.” IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop, West Point, NY (June 2000). Extended version submitted to IEEE Systems Man and Cybernetics.
“A Simulation Model for an Epidemic Multicast Protocol.” Accepted for BAS2000 Conference (5th Computer Networks Symposium), Bilkent University, Ankara, Turkey (June 2000) (with O. Rodeh).
“Optimizing Buffer Management for Reliable Multicast.” Submitted to 2nd Annual Workshop on Networked Group Communication (NGC 2000), Palo Alto, CA (November 2000) (with Z. Xiao and R. van Renesse).
“Next Generation Internet: Unsafe at Any Speed?” Accepted for publication in IEEE Computer, Special Issue on Infrastructure Protection (August 2000).
“A Dynamic Light-Weight Group Service.” Journal of Parallel and Distributed Computing, accepted for publication (2000) (with L. Rodrigues, K. Guo, and P. Verissimo).
“Building Reliable, High-Performance Communication Systems From Components.” 17th ACM Symposium on Operating Systems Principles (SOSP 2000) (Dec 1999), Kiwah Island, SC, 80-92 (with X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, and R. Constable).