The Systems group at Cornell examines the design and
implementation of the fundamental software systems that form
our computing infrastructure.
Below we give just a small representation of the varied systems work
going on here, and invite you to visit the project and faculty web pages,
as well as read the papers.
Operating Systems
Cornell is one of the few institutions world-wide where brand new
research operating systems are being actively developed. Sirer and
Schneider have recently built the Nexus
(SOSP 2011,
TOISS 2011, SIGCOMM 2011,
OSDI 2008),
a new operating system that enables trustworthy computing. The
Nexus enables a new class of applications that can provide
unprecedented confidentiality and integrity guarantees; as a
demonstration, they have built a social networking application, called
Fauxbook, where the users can be assured that their information can
not be leaked, even with Fauxbook developers acting as adversaries.
Hakim Weatherspoon is currently working on multi-core extensions to
the Linux operating system, as well as file system mirroring across
high bandwidth, high latency links
(FAST 2009).
Fred Schneider built a
replicated UNIX system using virtual machine technology
(SOSP 1995, ACM TOCS 1996). Emin Gun Sirer developed a new architecture for
networked virtual machines (SOSP 1999), and was the main kernel
designer for the SPIN extensible operating system (SOSP 1995).
| | |
Cloud Computing
The increasing popularity of cloud storage is leading organizations to
consider moving computing and data out of their own data centers and
into the cloud. However, success for cloud storage providers can present
a significant risk to customers; namely, it becomes very expensive to
switch storage providers. Hakim Weatherspoon and his students
have been working on various solutions to reduce lock-in risks for
cloud customers
(SOCC 2010, HotCloud 2011).
Emin Gun Sirer's group has proposed an unorthodox topology
for datacenters that eliminates all hierarchical switches in favor
of connecting nodes at random according to a small-world-
inspired distribution
(SOCC 2011).
Surprisingly, these Small-World Datacenters can achieve higher bandwidth
and fault tolerance compared to both conventional hierarchical datacenters
Hakim Weatherspoon and his students are looking
at designing datacenter storage systems that are frugal with energy use
(HotOS 2007).
Other cloud computing work includes Birman's
Isis2 project.
Isis2 is a library for replicating
data or computing in cloud services coded in standard object oriented
languages like Java, C# or C++: it automates such tasks as replicating
updates, keeping checkpoints and encrypting data for security.
Van Renesse and his students are working on scalable replication techniques
for the cloud. They have recently developed something they call
Elastic Replication where configuration management is carefully
separated from ordering updates to replicas. Replicated objects
cooperatively are responsible for each other's configuration. Doing
so, management can be aggressively decentralized for scalability while
strong consistency is maintained.
For summaries of additional cloud computing research at Cornell see
cloudcomputing.cornell.edu.
| | |
Distributed Systems and Fault Tolerance
Cornell is particularly well-known for its foundational and practical work
on fault-tolerant distributed systems.
Ken Birman's book on reliable distributed systems is widely used in
classrooms and industry (a new edition will be published early in 2012).
His ISIS toolkit system
(SOSP 1985,
SOSP 1987)
was used extensively in industry for building fault-tolerant systems for
decades, and Birman is currently building a new version, Isis2,
aimed at scalable reliability for cloud computing systems (more details
below); other such systems created by Birman and Van Renesse in the past
decade include Ensemble
(SOSP 1999).
Fred Schneider's oft-referenced and ACM Hall of Fame award-winning
State Machine Replication tutorial is standard fare in systems courses
around the world
(ACM Computing Surveys 1990).
Van Renesse and Schneider invented and analyzed the
Chain Replication paradigm
(OSDI 2004),
which is now used by several large Internet services.
Birman, Schneider, and Van Renesse all were major contributors to the 2010 Springer book
"Replication: Theory and Practice", a comprehensive book on replication techniques.
Van Renesse and Schneider are currently investigating
building robust distributed systems based on stepwise refinement.
Andrew Myers and his students are working on Fabric, a federated,
distributed system for securely and reliably storing, sharing, and
computing information (SOSP 2009).
Fabric presents a single-system image of all
resources that can be named by it, and provides security guarantees
to mutually distrusting principals using it, but it is a decentralized
system with no centralized security enforcement mechanism.
Van Renesse and his students developed Nysiad
(NSDI 2008)
a system that implements a new technique for transforming
a scalable distributed system or network protocol
tolerant only of crash failures into one that tolerates arbitrary
failures, including such failures as freeloading and malicious attacks.
Currently, he is working with Robert Constable of the NuPrl group on
automatically synthesizing fault tolerant algorithms such as consensus.
| | |
Networking
Emin Gun Sirer and his students have proposed a technique to create
more expressive, futuristic networks that enable users to query them
about their properties (SIGCOMM 2011). Called NetQuery, this technique
leverages emerging secure coprocessors to establish ground truths
about network elements, and facilitates automated reasoning about
whether a network exhibits a particular characteristic (e.g. capacity,
low loss rate, availability of redundant paths, etc). In separate
work, Sirer's group proposed a new way of distributing multimedia
content using a hybrid peer-to-peer architecture (SIGCOMM 2011). This
scheme combines the advantages of client-server systems with the high
capacity of peer-to-peer systems, and improves on both of these
architectures by quantitatively evaluating the marginal benefit of
available bandwidth to competing consumers, enabling efficient
utilization of upload bandwidth. Nate Foster, in cooperation with
researchers at Princeton, has been developing high-level languages,
such as Frenetic, for
programming distributed collections of enterprise network switches
(ICFP 2011,
HotNets 2011,
POPL 2012).
The languages allow modular reasoning about network properties.
Ken Birman, Robbert van Renesse, Hakim Weatherspoon, and their
students are working with researchers from other academia and industry
on the Nebula project, whose goal is to address threats to the cloud
while meeting the challenges of flexibility, extensibility and
economic viability (IEEE Internet Computing 2011). One artifact that
came out of this work is TCPR, a tool that fault-tolerant applications
can use to recover their TCP connections after crashing or migrating;
it masks the application failure and enables transparent recovery, so
the remote peers remain unaware. Another artifact under development
is SoNIC (Software-defined Network Interface Card), which provides
precise and reproducible measurements of an optical lambda network. By
achieving extremely high levels of precision, SoNIC can shed light on
the complexities of flows that traverse high-speed networks (IMC 2010,
DSN 2010).
Andrew Myers and Emin Gun Sirer developed Trickles (NSDI 2005, ACM TOCS 2008), a novel
TCP-like transport protocol and a new interface to replace sockets
that together enable all state to be kept on one endpoint, allowing
the other endpoint, typically the server, to operate without any
per-connection state.
| | |
Peer-to-Peer Systems
Cornell faculty have done extensive work in the Peer-to-Peer networking area,
ranging from file sharing to media streaming to network monitoring.
Emin Gun Sirer and his students have created a large number of P2P
systems, including Blindfold (IPTPS 2010), a scheme to ensure
that the operators of content aggregators are completely
blind to the content that they are storing and serving,
thereby eliminating the possibility to censor content
at the servers;
Antfarm (NSDI 2009), a content distribution system based on managed swarms;
Octant (NSDI 2007), a system for geolocation of
Internet hosts; and
Beehive, a peer-to-peer replication system on which a new DNS, an Internet-scale Publish-Subscribe system and a new content distribution network were built
(NSDI 2006,
SIGCOMM 2004). Sirer's
Credence system
(NSDI '05) for determining authentic content on peer-to-peer systems was deployed on the Gnutella network and led to a large-scale user study.
The Karma system explored peer-to-peer currencies long before Bitcoin.
Van Renesse developed Fireflies, a Byzantine-tolerant
P2P overlay network (Eurosys 2006).
Van Renesse and Birman developed the highly scalable Astrolabe
network monitoring system (IPTPS 2002, ACM TOCS 2003), now used at a
major e-retailer.
Weatherspoon
designed and implemented the Antiquity system, a secure P2P storage
facility (Eurosys 2007).
| | |
Cross-Cutting Research Areas
Besides the topics mentioned above, the systems faculty is also
actively involved with cross-cutting research involving
Security,
Programming Languages,
Computer Architecture,
and
Theory.
One example of cross-cutting research is the
Meridian (SIGCOMM'05) project, where Sirer and his group applied some of the insights from Jon Kleinberg's theoretical work on small-world networks to peer-to-peer systems. Meridian, and its successor, Cubit, can route queries to their destinations efficiently over a lightweight overlay, providing new search functionality not possible with pereceding peer-to-peer systems. The Cubit plugin for Vuze was downloaded by more than 30000 people.
Another example of cross-cutting research is the work Birman's group has done
on Web 2.0 collaboration. This effort looked
at the challenges of using Web 2.0 technologies (mashups) in support of
demanding collaboration applications, such as one sees in the military
or in hospitals. As part of this effort, they built a mashup technology
of their own, Live Objects
(ECOOP 2008, Middleware 2008). Have a look at the demo.
Environment
The Systems Group at Cornell prides itself on its collegial internal
environment. Our Systems Lunches, where professors and graduate students
get together every Friday to have lunch and discuss recent, cutting-edge
papers in the field, draws an attendance of 40-60 people and has been
adopted (blatantly copied!) by many other institutions. And Cornell's
Systems Lab, a large collaborative space with wall-to-wall whiteboards,
projectors, sound systems and work areas for up to three dozen people,
has served as a crucible where people hack together on projects and design
new systems.
| | |
|
People

Ken Birman
Distributed computing, fault-tolerant network systems, distributed systems security, large-scale network applications.

Nate Foster
Programming languages, Networking, Data Management, Security, Type Systems, Provenance.

Andrew Myers
Programming languages, security, mobile code, persistent and distributed objects.

Fred B. Schneider
Distributed systems security and fault-tolerance, mobile code, concurrent programming, secure OS.

Emin Gun Sirer
Operating systems, cloud computing, networking, distributed systems, large-scale networked services and extensible systems.

Robbert van Renesse
Distributed computing, peer-to-peer networking, scalability, fault tolerance, adaptive networking.

Hakim Weatherspoon
Distributed computing, large scale storage systems, energy-aware computing, operating systems.
Related Links
Architecture
Programming Languages
Security
Systems Lunch Seminar
Cornell Systems Laboratory
Cloud Computing at Cornell
Past and Ongoing Projects
CobWeb
CoDoNS
Corona
Credence
Cubit
Frenetic
Herbivore
Isis2
Karma
Magnetos
Meridian
Nexus
SHARP
Sqrt(s)
Sextant
SNS
Trickles
|