Professor Kenneth P. Birman
Dept. of Computer Science
4119 Upson Hall, Cornell University
Ithaca, New York 14853

Work: 607-255-9199
Fax:    607-255-4428

 

 New!  Check out the awesome demo  Krzys and Jong did of our new Live Objects platform (this will involve downloading two large files and playing the videos).  Then download the platform and use it to build something cool… let us know about your experience!  Live objects and Quicksilver are designed to be easy to use).  The download site, which also has the same demo videos available on it, is http://liveobjects.cs.cornell.edu

Full publications list

Book:  I want to welcome users of my new textbook, "Reliable Distributed Systems: Technologies, Web Services, and Applications" (Springer-Verlag, March 2005).  I've been teaching from the book and am making materials available here

TRUST National Science and Technology Center: I'm a founding member of TRUST, a consortium that brings researchers from Berkeley, Carnegie Mellon, Cornell, Stanford and Vanderbilt Universities together to explore a wide range of challenges in the area of "trustworthy computing".  The consortium will work with industry to tackle problems of a scale and complexity that exceeds what any of us could have done individually, and where the topic brings a mixture of not just technical issues, but also social, legal, economic, or regulatory policy challenges.

Research: My research is concerned with reliable distributed computing, applications involving reliable information collection or dissemination, and problems associated with security in complex distributed systems. A recent focus has been on massive scalability -- building really big computing systems that remain stable even under stress, overload, or when components fail or are restarted.  Our most recent technologies relate to what we call “Live Distributed Objects” and to a new protocol called “Maelstrom”.   Details and downloads are available here.

Some examples of mission-critical systems on which my software was used in the past include the New York Stock Exchange and Swiss Exchange, the French Air Traffic Control system, the AEGIS warship and a wide range of applications in settings like factory process control and telephony.   In fact, every stock quote or trade on the NYSE from 1995 until early 2006 was reported to the overhead trading consoles through software I personally implemented - a cool (but also scary) image, for me at least!  During the ten years this has been in operation, many computers have crashed during the trading day, and many other problems have occurred - but the design we developed and implemented has managed to reconfigure itself automatically and kept the overall system up, without exception.  They haven't had a single outage during this entire period.  As far as I know, the other organizations listed above have similar stories to report.

Today, these kinds of ideas are gaining "mainstream" status.  For example, IBM's Websphere 6.0 product includes a multicast layer used to replicate data and other runtime state for high-availability web service applications and web sites.  Although IBM developed its own implementation of this technology, we've been told by the developers that the architecture was based on Cornell's Horus and Ensemble systems, described more fully below.  The CORBA architecture includes a fault-tolerance mechanism based on some of the same ideas.  And we've also worked with Microsoft on the technology at the core of the next generation of that company's clustering product.  So, you'll find Cornell's research not just on these web pages, but also on web sites worldwide and in some of the world's most ambitious data centers and high availability computing systems.

A major emerging opportunity involves management of new kinds of networks.  The issue here is that applications such as secure conferencing systems need to administer resources at multiple locations and to do so in a secure, reliable way, perhaps while handling enormous data rates and rapid changes in configuration.  Some types of networks involve placing agents (for example, content filtering or overlay routing components) at large numbers of routers.  Thus, scalable replication, security policy and key management, system monitoring and control are rapidly becoming critical requirements.  Our work is directly applicable in such settings.

My group often works with vendors and industry researchers.  We maintain a very active dialog with the US government and military on research challenges emerging from a future generation communication systems now being planned by organizations like the Air Force and the Navy.  We've even worked on new ways of controlling the electric power grid, but not in time to head off the big blackout in 2003!  Looking to the future, we are focused on needs arising in financial systems, large-scale military systems, and even health-care networks.

I'm just one of several members of a group in this area at Cornell.  My closest colleagues and co-leaders of the group are Robbert van Renesse, Einar Vollset and Hakim Weatherspoon (the latter two are visiting for two or three year periods).  We also collaborate with Gun Sirer, Paul Francis, Al Demers and Johannes Gehrke, as well as with other systems faculty members at Cornell: Andrew, Fred, etc.  The systems group is close-knit, and many of our students are jointly advised by other faculty members in the systems area.  Werner Vogels worked with us until September 2004, when he joined Amazon.com as Vice President and Director for Systems Research. 

Four generations of reliable distributed systems research!  Overall, our group has developed three generations of technology and is now working on a fourth generation system: The Isis Toolkit, developed mostly during 1987-1993, the Horus system, developed starting in 1990 until around 1995, the Ensemble system, 1995-1999.  Right now we're developing a number of new systems including Live Objects, Quicksilver, Maelstrom, Ricochet and Tempest.  These pull together a set of technologies based on peer-to-peer and epidemic-based protocols that offer remarkable stability and scalability under stress.  Isis, Horus and Ensemble focus on the virtual synchrony model for reliable multicast and group communication, while our new platforms target applications using a publish-subscribe programming model.  On the other hand, the old ideas aren't dead: Quicksilver will include a novel new implementation of virtual synchrony, designed to work well in settings with massive numbers of groups.

We're also doing a technology for search-and-rescue and other kinds of mobile teaming/cooperation, as part of a DARPA-funded consortium called ACERT. 

Broadly, much of our current work uses techniques similar to the ones employed by the new generation of peer-to-peer file systems.   We depart from this prior work by combining peer-to-peer communication with a type of protocol called a "gossip" or "epidemic" dissemination mechanism.  We also use a very different style of data structure (the hierarchical structures used in our work are purely abstractions, while the tree structures in systems like Pastry, Tapestry or Chord are "real").    The resulting mixture of solutions scales well, has fascinating quality-of-service properties, and offers great performance.  For example, our new Ricochet protocol slashes latencies while offering incredible throughput and reliability relative to past work on reliable multicast.  The connection to peer-to-peer communication is that Ricochet gets its speedups by computing what we call "lateral error correction" packets, which are shared using a protocol related to gossip.

Ricochet is just one example among many.  We're finding that by mixing new peer-to-peer ideas with classical protocols, we can obtain exciting mixtures of scalability, reliability and even security guarantees, and that we can present the solutions in a number of ways (they don't look anything like peer-to-peer file systems, despite using a peer-to-peer style of communication).  There are no centralized servers or other single points of failure.

The best papers to read if you want to get a sense of what we are doing are Quicksilver Scalable Multicast, Ricochet: Low-Latency Multicast for Scalable Time-Critical Services.,  Astrolabe: A Robust and Scalable Technology for Distributed Monitoring, Management and Data Mining, A Churn-Resistant Peer-to-Peer Web Caching System and Bimodal Multicast.  Much more information and links to many more papers can be found on our research web pages.

Research web pages:
             Live Objects, Quicksilver, Maelstrom, Ricochet and Tempest projects
              Ensemble project
              Horus project
              Isis project (old stuff!)

A collection of papers on Isis, edited by myself with Robbert van Renesse, may still be available -- it was called Reliable Distributed Computing with the Isis Toolkit and was in the IEEE Press Computer Science series.    

Teaching:  During Fall 2006 and Spring 2007, I'll probably teach CS614 (PhD oriented) and then CS514 (MEng oriented). 

Graduate Studies in Computer Science at Cornell:  At this time of the year, we get large numbers of inquiries about our PhD program.  I want to recommend that people interested in the program not contact faculty members like me directly with routine questions like "can your research group fund me".  As you'll see from the web page, Cornell does admissions by means of a committee, so individual faculty members don't normally play a role.  This is different from many other schools -- I realize that at many places, each faculty member admits people into her/his own group.  But at Cornell, we admit you first, then you come here, and then you affiliate with a research group after a while.  Funding is absolutely guaranteed for people in the MS/PhD program during the whole time they are at Cornell.  On the other hand, students in the MEng program generally need to pay their own way.

Obviously, some people have more direct, specific questions, and there is no problem sending those to me or to anyone else.  But as for the generic "can I join your research group?" the answer is that while I welcome people into the group if they demonstrate good ideas and talent in my area, until you are here and take my graduate course and spend time talking with me and my colleagues, how can we know if the match is good?  And most such inquiries are from people who haven't yet figured out quite how many good projects are underway at Cornell.  Perhaps, on arrival, you'll take Andrew Myer's course in language based security and will realize this is your passion.  So at Cornell, we urge you to take time to find out what areas we cover and who is here, to take some courses, and only then affiliate with a research group.  But please knock on my door any time you like!  I'm more than happy to talk to any student in the department about anything we're doing here!