|
Kenneth P. Birman N. Rama Rao Professor of Computer Science 4119 Upson Hall, Cornell University Ithaca, New York 14853 Work: 607-255-9199 Mobile: 607-227-0894 Email: ken@cs.cornell.edu |
Recent News: The Isis2 System is available for download!
· Isis2 is our new platform for building scalable, strongly consistent, cloud computing services. Developed by me over a two year period, the system is going to be our main focus for the next few years, and we intend to make it work, well, even in massive scenarios. Of course, nothing ever works outside the cases in which we've tested, and up to now our testing has focused on replication in services with up to about 1000 member processes, with consistency that can range from a weak form of durability ideal for first-tier "soft state" cloud services to a much stronger form that matches what Paxos offers, but scales far beyond what normal Paxos systems can do (we know because the system also includes a full Paxos implementation and we’ve compared the two side-by-side). Configurations with far more than 1000 nodes are planned for 2012. In addition to the basics (group communication, virtual synchrony model, etc) we also have a key-value option that shards data within a big group, using a user-determined replication factor (think of Amazon's Dynamo shopping cart), security features, automated logging and checkpoint-recovery, and a wide range of other features.
· To learn more, visit the Isis2 web page.
· To learn even more, wait for the new edition of Ken's Reliable Distributed Systems textbook. In a totally biased manner, the new edition of my textbook will expand coverage of cloud computing, using Isis2 as its "model" for how to solve high-assurance cloud problems. Of course this isn't as unfair as it may sound, since the major cloud platforms mostly lack anything like Isis2. The book should be out early in 2012, from Springer Verlag. I’m teaching Cornell’s CS5412 Cloud Computing class this spring; my materials are online and you are welcome to use them.
Current Research (full
publications list):
Older work.
I've really worked in Cloud
Computing for most of my career, although it obviously wasn't called cloud
computing in the early days. As a result, our papers in this area date back to
1985. Some examples of mission-critical systems on which my software was used
in the past include the New York Stock Exchange and Swiss Exchange, the French
Air Traffic Control system, the AEGIS warship and a wide range of applications
in settings like factory process control and telephony. In fact, every stock
quote or trade on the NYSE from 1995 until early 2006 was reported to the
overhead trading consoles through software I personally implemented - a cool
(but also scary) image, for me at least! During the ten years this system was
running, many computers crashed during the trading day, and many network
problems have occurred - but the design we developed and implemented has
managed to reconfigure itself automatically and kept the overall system up,
without exception. They didn't have a single trading disruption during the entire
period. As far as I know, the other organizations listed above have similar
stories to report.
Today, these kinds of ideas are
gaining "mainstream" status. For example, IBM's Websphere
6.0 product includes a multicast layer used to replicate data and other runtime
state for high-availability web service applications and web sites. Although
IBM developed its own implementation of this technology, we've been told by the
developers that the architecture was based on Cornell's Horus and Ensemble
systems, described more fully below. The CORBA architecture includes a
fault-tolerance mechanism based on some of the same ideas. And we've also
worked with Microsoft on the technology at the core of the next generation of
that company's clustering product. So, you'll find Cornell's research not just
on these web pages, but also on web sites worldwide and in some of the world's
most ambitious data centers and high availability computing systems.
In fact we still have very active
dialogs with many of these companies: Cisco, IBM, Intel, Microsoft, Red Hat,
and others. An example of an ongoing dialog is this: we’ve been working with
Cisco to invent a new continuous availability option for their core Internet
routers, the CRS-1 series. You can read about this work here.
A major emerging opportunity
involves management of new kinds of networks. The issue here is that
applications such as secure conferencing systems need to administer resources
at multiple locations and to do so in a secure, reliable way, perhaps while
handling enormous data rates and rapid changes in configuration. Some types of
networks involve placing agents (for example, content filtering or overlay
routing components) at large numbers of routers. Thus, scalable replication,
security policy and key management, system monitoring and control are rapidly
becoming critical requirements. Our work is directly applicable in such
settings.
My group often works with vendors
and industry researchers. We maintain a very active dialog with the US
government and military on research challenges emerging from a future
generation communication systems now being planned by organizations like the
Air Force and the Navy. We've even worked on new ways of controlling the
electric power grid, but not in time to head off the big blackout in 2003!
Looking to the future, we are focused on needs arising in financial systems,
large-scale military systems, and even health-care networks. (In this connection,
I should perhaps mention that although we do get research support from the
government and the US military, none of our research is classified or even
sensitive, and all of it focuses on widely used commercial standards and
platforms. Most of our software is released for free, under open source
licenses.)
I'm just one of several members of a
group in this area at Cornell. My closest colleagues and co-leaders of the
group are Robbert van Renesse and Hakim Weatherspoon.
We also collaborate with Gun Sirer, Paul Francis, Al
Demers and Johannes Gehrke, as well as with other
systems faculty members at Cornell: Andrew, Fred, Rafael, Joe, etc. The systems
group is close-knit, and many of our students are jointly advised by other
faculty members in the systems area. Werner Vogels
worked with us until September 2004, when he joined Amazon.com as Vice
President and Director for Systems Research.
Four generations of reliable distributed systems research! Overall, our group has developed three generations of technology
and is now working on a fourth generation system: The Isis Toolkit, developed
mostly during 1987-1993, the Horus system, developed starting in 1990 until
around 1995, the Ensemble system, 1995-1999. Right now
we're developing a number of new systems including Isis2, Gradient,
and the reliable TCP solution mentioned above, and working with others to
integrate those solutions into settings where reliability, security,
consistency and scalability are make-or-break requirements.
Older Research web pages:
Live Objects,
Quicksilver, Maelstrom, Ricochet and Tempest projects
Ensemble
project
Horus project
Isis project
(really old stuff!)
A collection of papers on Isis,
edited by myself with Robbert van Renesse, may still
be available -- it was called Reliable Distributed Computing with the Isis
Toolkit and was in the IEEE Press Computer Science series.
Teaching: During Spring 2012 I'll be running an
MEng-oriented course on cloud computing: CS5412.
Please consult the course
web page for more information. We are hoping to make videonotes
from the classes available on the course web site, and all of my slide sets are
available for public download and use.
Graduate Studies in Computer Science
at Cornell: At this time of the year, we get
large numbers of inquiries about our PhD program. I want to recommend that
people interested in the program not contact faculty members like me
directly with routine questions like "can your research group fund
me". As you'll see from the web page, Cornell does admissions by means of
a committee, so individual faculty members don't normally play a role. This is
different from many other schools -- I realize that at many places, each faculty
member admits people into her/his own group. But at Cornell, we admit you
first, then you come here, and then you affiliate with a research group after a
while. Funding is absolutely guaranteed for people in the MS/PhD program during
the whole time they are at Cornell. On the other hand, students in the MEng program generally need to pay their own way.
Obviously, some people have more
direct, specific questions, and there is no problem sending those to me or to
anyone else. But as for the generic "can I join your research group?"
the answer is that while I welcome people into the group if they demonstrate
good ideas and talent in my area, until you are here and take my graduate
course and spend time talking with me and my colleagues, how can we know if the
match is good? And most such inquiries are from people who haven't yet figured
out quite how many good projects are underway at Cornell. Perhaps, on arrival,
you'll take Andrew Myer's course in language based security and will realize
this is your passion. So at Cornell, we urge you to take time to find out what areas we cover and who is here, to take some courses, and
only then affiliate with a research group. But please knock on my door any time
you like! I'm more than happy to talk to any student in the department about
anything we're doing here!