|
EMİN GÜN SİRER | |
|
Associate Professor
4151 Upson Hall |
(607) 255-7673 (607) 255-4428 fax Turn on JavaScript to view email address public key |
|
I work on self-organizing systems, which span operating systems, networking and distributed systems. I like building things, especially systems that have some principled reason for why they should work.
My current projects involve peer-to-peer systems, systems support for ad hoc networks, and operating systems.
Peer-to-peer Systems | |
| Beehive CoDoNS CobWeb CorONA |
Beehive is a high-performance distributed hash table. A novel optimization technique enables Beehive to respond to queries quickly, tolerate denial of service attacks, and balance load. Beehive provides strong performance guarantees even in the presence of queries drawn from Power Law distributions, previously thought to be difficult because heuristics-based approaches tried in the past do not work well with such distributions. We have used Beehive to build new, resilient infrastructure services for the Internet. CoDoNs is a safety net and a replacement for the Domain Name System that provides strong security, performance, and fast dynamic updates for existing Internet names. CobWeb is an Akamai-like open-access content distribution network. CorONA is a high-performance publish-subscribe system for web micronews. |
| Octant | Octant is a system for determining the physical location of Internet hosts. Given a host, Octant determines the boundaries of the region in which the node is likely to lie. Behind the scenes, Octant consists of two parts: a comprehensive framework for efficiently representing and combining a system of constraints, and a set of mechanisms for extracting tight constraints on node locations without resulting in an overconstrained system. |
| Sqrt(S) |
|
| Meridian | Meridian is a peer-to-peer overlay network for performing location-aware node and path selection in large-scale distributed systems. It is simple to deploy, robust to churn, and can accurately find the nearest node, pick the most centrally placed node, and find a node that fits latency constraints. |
| Credence | Credence is a reputation system for peer-to-peer networks, designed to provide an accurate metric for the trustworthiness of labels associated with shared files. It differs from previous work in that it derives its trust metric from principled measures that reflect likelihood of similar behavior between peers, has a completely distributed architecture with no trusted nodes, and a concrete implementation. Credence can guard against Sybil attacks and other malicious behavior from spammers. The Credence implementation is free, open-source, and backwards compatible with Gnutella. |
| CorSSO | CorSSO is a distibuted authentication service that provides network identities that span multiple application services, also known as single sign-on. It enables authentication functionality to be factored out of application services and delegated to combinations of authentication servers. It uses threshold cryptography for efficiency, fault tolerance and resiliance against attackers. |
| Karma | Karma is a virtual currency for use in keeping track of users' resource contribution and consumption in peer-to-peer systems. The karma system provides a secure exchange mechanism for a self-regulating, incentive-compatible, decentralized currency. |
| Herbivore | Herbivore is a peer-to-peer, self-organizing, robust system for anonymous communication. It uses dining cryptographer networks to provide anonymity guarantees even in the presence of attackers which can eavesdrop on every packet in the network. It's a follow-on to the CliqueNet project. |
Operating Systems | |
| Nexus | I am building a new operating system for trustworthy computing. It is built from scratch, with security and trusted computing in mind. We have been hacking on it for a few months now and it can boot, send mail, play movies, and run many other applications. |
| Trickles | Trickles is a high-performance protocol for stateless, connection-oriented communication. It comprises a transport protocol to replace TCP and a new interface to replace sockets that allow all server-side state to be shipped to clients. This leads to applications that are more scalable and robust against denial-of-service attacks. And it enables a new class of services: since a stateless network stack allows packets to be serviced at any server replica regardless of past history, Trickles enables fast transparent failover, fine-grain load balancing and connection-oriented anycast services to be implemented transparently inside the network fabric. |
| Portos | Portos is an emulation-based system and a set of corresponding projects that I use to teach introductory-level operating systems. The projects cover traditional topics, as well as new issues (such as routing and self-organization) raised by mobile, ubiquitous computing. The base emulation framework emulates a virtual processor, with attached virtual devices, on Windows NT (including 98/ME/XP/etc.) and CE. It runs on x86 and StrongArm processors. It's freely available. |
| Kimera |
The goal of the Kimera project is to enable networks of computers that
are cheaper, more secure and more manageable than what we have
now. The problem with current state of the art systems, like Java, is that network clients are monolithic, that is, they implement all requisite
system services locally. Consequently, every endpoint needs to
have sufficient resources to support services like verification,
compilation and security management, which require too much memory
and processing power for embedded devices. Further, each endpoint
entails associated software state that is hard to secure and manage.
These problems become particularly acute as the number of clients
increases.
The Kimera project addresses these problems by factoring system components out of clients into network servers. The clients can thus be smaller because they do not have to support complex services locally. Further, the overall network is easier to manage because service functionality is centralized. The services operate by intercepting applications as they are fetched by clients; they inspect applications, and where necessary, inject code snippets into them to provide the requisite service functionality. Digital signatures ensure that only applications that have been vetted by centralized services execute on the clients. |
| SPIN |
The goal of the SPIN project is to enable applications to customize
operating system functionality by downloading application-specific
extensions dynamically into the kernel. The extensions, as well as the
core system, are written in Modula-3, whose type-safety ensures load
and store protection. A namespace management interface restricts the
interfaces that applications can access, and creates multiple
software protection domains within a single privileged address space.
The SPIN execution model enables applications to safely extend the kernel with thread packages and hierarchical scheduling policies. A combination of language and system mechanisms provide security and fault-isolation while mediating access to processors. The SPIN protection domain interface simultaneously allows isolation and safe fine-grain sharing within a privileged address space. Extensions reside in hierarchical namespaces which they can use to share or to hide code and data at the granularity of interfaces. Use of hierarchical capabilities simplifies security management. Extensions that want to safely share code and data can do so without dynamic protection enforcement overhead. The SPIN web server uses all of these mechanisms to implement high-performance web service. |
| MIPSI | MIPSI is a robust and extensive MIPS instruction set simulator. It has been used in many classes and research projects. |
Ad Hoc Networks | |
| MagnetOS | MagnetOS is an operating system for ad hoc networks. It makes the entire network appear as a single Java virtual machine. It enables applications to be constructed easily and to execute efficiently. |
| Sextant | Sextant is a general framework for discovering the location of nodes and events in wireless networks. Sextant enables nodes without specialized hardware, such as GPS, to efficiently discover their approximate locations. Further, given a sensor network where each node can sense events in its immediate vicinity, Sextant pinpoints the location of events with high fidelity. |
| SHARP | SHARP is a hybrid routing protocol that dynamically finds the optimal mix of proactive route dissemination and reactive route discovery to achieve application-specific performance goals. |
| SNS | SNS is a scalable, high-performance wireless network simulator, based on ns2. It vastly outperforms standard ns2 in speed and scale through a new technique we developed called staged simulation. It achieves its improvements in speed and scale by eliminating redundant computations both within a single and across multiple simulation runs. It has been carefully validated against ns2 - staging preserves accuracy while speeding up simulations. |
| Spring 2007 | CS 615, Peer-to-Peer Systems | A graduate course on peer-to-peer systems. |
| The Cornell Systems Lunch | We will be reading recent papers on systems topics. |
Previous courses I have taught.
Peer-to-peer systems, in which participants pool their resources to accomplish their goals, have become ubiquitous. Since rational peers engage in strategic behavior, much past work has examined the design of mechanisms that incentivize peers to provide resources. The predominant design paradigm to date has been tit-for-tat, in which each peer benefits from every interaction. In this paper, we discuss the limitations of the tit-for-tat design paradigm and make the case for a new approach based on seeking the globally optimal outcome known as the common good.
Existing distributed hash tables provide efficient mechanisms for storing and retrieving a data item based on an exact key, but are unsuitable when the search key is similar, but not identical, to the key used to store the data item. In this paper, we present a scalable and efficient peerto-peer system with a new search primitive that can efficiently find the k data items with keys closest to the search key. The system works via a novel assignment of virtual coordinates to each object in a high-dimensional, synthetic space such that the proximity between two points in the coordinate space is correlated with the similarity between the strings that the points represent. We examine the feasibility of this approach for efficient, peer-to-peer search on inexact string keys, and show that the system provides a robust method to handle key perturbations that naturally occur in applications, such as file-sharing networks, where the query strings are provided by users.
Determining the physical location of Internet hosts is a critical enabler for many new location-aware services. In this paper, we present Octant, a novel, comprehensive framework for determining the location of Internet hosts in the real world based solely on network measurements. The key insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a system of constraints by deriving them aggressively from network measurements, and to solve the system geometrically to yield the estimated region in which the target resides. This approach gains its accuracy and precision by taking advantage of both positive and negative constraints, that is, constraints on where the node can and cannot be, respectively. The constraints are represented using regions bounded by Bezier curves, allowing precise constraint representation and low-cost geometric operations. The framework can reason in the presence of uncertainty, enabling it to gracefully cope with aggressively derived constraints that may contain errors. An evaluation of Octant using PlanetLab nodes and public traceroute servers shows that Octant can localize the median node to within 22 mi., a factor of three better than other evaluated approaches.
Failure detectors are fundamental building blocks in distributed systems. Multi-node failure detectors, where the detector is tasked with monitoring N other nodes, play a critical role in overlay networks and peer-to-peer systems. In such networks, failures need to be detected quickly and with low overhead. Achieving these properties simultaneously poses a difficult tradeoff between detection latency and resource consumption. In this paper, we examine this central tradeoff, formalize it as an optimization problem and analytically derive the optimal closed form formulas for multi-node failure detectors. We provide two variants of the optimal solution for optimality metrics appropriate for two different deployment scenarios. Sqrt(s)-LM is a latency-minimizing optimal failure detector that achieves the lowest average failure detection latency given a fixed bandwidth constraint for system maintenance. Sqrt(s)-BM is a bandwidth-minimizing optimal failure detector that meets a desired detection latency target with the least amount of bandwidth consumed. We evaluate our optimal results with node lifetimes chosen from bimodal and Pareto distributions, as well as real-world trace data from PlanetLab hosts, web sites and Microsoft PCs. Compared to standard failure detectors in wide use, sqrt-s failure detectors reduce failure detection latencies by 40% on average for the same bandwidth consumption, or conversely, reduce the amount of bandwidth consumed by 30% for the same failure detection latency.
Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol, check the behavior of the host against a safety specification, and communicate using an overlay to alert other hosts about invalid behavior and to initiate remedial actions. Trusted computing hardware provides the basis for trust in external security monitors. This paper applies this approach to secure the Border Gateway Protocol, yielding an external security monitor called N-BGP. N-BGP can accurately monitor a BGP router using commodity trusted computing hardware. Deploying N-BGP at a random 10% of BGP routers is sufficient to guarantee the security of 80% of Internet routes where both endpoints are monitored by N-BGP. Overall, external security monitors secure the routing infrastructure using trusted computing hardware and construct a security plane for BGP without having to modify the large base of installed routers and servers.
This paper outlines a novel, comprehensive framework for geolocalization, that is, determining the physical location of Internet hosts based on network measurements. The core insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a system of constraints by deriving them aggressively from network measurements, and to solve the system using cheap and accurate geometric methods. The framework is general and accommodates both positive and negative constraints, that is, constraints on where the node can or cannot be, respectively. It can reason in the presence of uncertainty, enabling it to gracefully cope with aggressively derived constraints that may contain errors. Since the solution space is represented geometrically as a region bounded by Bezier curves, the framework yields an accurate set of all points where the target may be located. Preliminary results on PlanetLab show promise; the framework can localize the median node to within 22 miles, a factor of three better than previous approaches, with little error.
The web has failed to fulfill its promise of delivering relevant news and information in a timely fashion. In fact, it does not deliver anything on its own at all; instead, it requires its users to explicitly poll information sources. Checking for updates by pointing, clicking, and reloading Web sites, whether the sites are Slashdot, news, or online classifieds, is not only slow, inefficient, and cumbersome for users, but it places an unnecessary bandwidth burden on content providers. Recent attempts to automate this process, with the aid of feed readers, have created more problems than they have solved. A system that detects updates to content anywhere on the Web and delivers it to users via an asynchronous channel, such as an instant message, would do much to relieve the burden on users and content providers alike.
Despite the abundance of frequently changing information, the Web lacks a publish-subscribe interface for delivering updates to clients. The use of naive polling for detecting updates leads to poor performance and limited scalability as clients do not detect updates quickly and servers face high loads imposed by active polling. This paper describes a novel publish-subscribe system for the Web called Corona, which provides high performance and scalability through optimal resource allocation. Users register interest in Web pages through existing instant messaging services. Corona monitors the subscribedWeb pages, detects updates efficiently by allocating polling load among cooperating peers, and disseminates updates quickly to users. Allocation of resources for polling is driven by a distributed optimization engine that achieves the best update performance without exceeding load limits on content servers. Large-scale simulations and measurements from PlanetLab deployment demonstrate that Corona achieves orders of magnitude improvement in update performance at a modest cost.
In this paper, we describe Credence, a decentralized object reputation and ranking system for large-scale peer-to-peer filesharing networks. Credence counteracts pollution in these networks by allowing honest peers to assess the authenticity of online content through secure tabulation and management of endorsements from other peers. Our system enables peers to learn relationships even in the absence of direct observations or interactions through a novel, flow-based trust computation to discover trustworthy peers. We have deployed Credence as an overlay on top of the Gnutella filesharing network, with more than 10,000 downloads of our client software to date. We describe the system design, our experience with its deployment, and results from a long-term study of the trust network built by users. Data from the live deployment shows that Credence's flow-based trust computation enables users to avoid undesirable content. Honest Credence clients can identify three quarters of the decoys encountered when querying the Gnutella network.
ClosestNode.com is an accurate, scalable, and backwards-compatible service for mapping clients to a nearby server. It provides a DNS interface by which unmodified clients can look up a service name, and get the IP address of the closest server. A shared system for performing such a mapping amortizes the administration and implementation costs of proximity-based server selection. It is aimed at minimizing the amount of effort required for system developers to make new and existing infrastructure services proximity-aware.
This paper examines replication in content distribution networks and proposes a novel mechanism for optimally resolving performance versus cost tradeoffs. The key insight behind our work is to formally and analytically capture the relationship between performance, bandwidth overhead and storage requirements for a web cache, express the system goals as a mathematical optimization problem, and solve for the optimal extent of replication that achieves the desired system goals with minimal overhead. We describe the design and implementation of a new content distribution network based on this concept, called CobWeb. CobWeb can achieve a target lookup latency while minimizing network and storage overhead, minimize access time while keeping bandwidth usage below a set limit, and alleviate flash crowd effects by rapidly replicating popular objects through fast and highly adaptive replica management. We outline the architecture of the CobWeb system, describe its novel optimization algorithm for intelligent resource allocation, and compare, through simulations and a physical deployment on PlanetLab, CobWebs informed, analysis-driven replication strategy to existing approaches based on passive caching and heuristics.
While publish-subscribe systems have attracted much research interest since the last decade, few established benchmarks have emerged, and there has been little characterization of how publish-subscribe systems are used in practice. This paper examines RSS, a newly emerging, widely used publish-subscribe system for Web micronews. Based on a trace study spanning 45 days at a medium-size academic department and periodic polling of approximately 100,000 RSS feeds, we extract characteristics of RSS content and usage. We find that RSS workload resembles the Web in content size and popularity; feeds are typically small (less than 10KB), albeit with a heavy tail, and feed popularity follows a power law distribution. The update rate of RSS feeds is widely distributed; 55% of RSS feeds are updated hourly, while 25% show no updates for several days. And, only small portions of RSS content typically change during an update; 64% of updates involve less than three lines of the RSS content. Overall, this paper presents an analysis of RSS, the first widely deployed publish-subscribe system, and provides insights for the design of next generation publish-subscribe systems.
The Domain Name System, DNS, is based on nameserver delegations, which introduce complex and subtle dependencies between names and nameservers. In this paper, we present results from a large scale survey of DNS, and show that these dependencies lead to a highly insecure naming system. We report specically on three aspects of DNS security: the properties of the DNS trusted computing base, the extent and impact of existing vulnerabilities in the DNS infrastructure, and the ease with which attacks against DNS can be launched. The survey shows that a typical name depends on 46 servers on average, whose compromise can lead to domain hijacks, while names belonging to some countries depend on a few hundred servers. An attacker exploiting well-documented vulnerabilities in DNS nameservers can hijack more than 30% of the names appearing in the Yahoo and DMOZ.org directories. And certain nameservers, especially in educational institutions, control as much as 10% of the namespace.
Peer-to-peer filesharing is now commonplace and its traffic now dominates bandwidth consumption at many Internet peering points. Recent studies indicate that much of this filesharing activity involves corrupt and polluted files. This paper describes Credence, a new object-based reputation system, and shows how it can counteract content pollution in peer-to-peer filesharing networks. Credence allows honest peers to assess the authenticity of online content by securely tabulating and managing endorsements from other peers. We employ a novel voter correlation scheme to weigh the opinions of peers, which gives rise to favorable incentives and system dynamics. We present simulation results indicating that our system is scalable, efficient, and robust.
In this paper, we describe the design and implementation of a distributed operating system for ad hoc networks. Our system simplifies the programming of ad hoc networks and extends total system lifetime by making the entire network appear as a single virtual machine. It automatically and transparently partitions applications into components and dynamically finds them a placement on nodes within the network to reduce energy consumption and to increase system longevity. This paper describes our programming model, outlines the design and implementation of our system and examines the energy efficiency of our approach through extensive simulations as well as validation of a deployment on a physical testbed. We evaluate practical, power-aware, general-purpose algorithms for component placement and migration, and demonstrate that they can significantly increase system longevity by effectively distributing energy consumption and avoiding hotspots.
Determining node and event locations is a canonical task for many wireless network applications. Yet dedicated infrastructure for determining position information is expensive, energy-consuming, and simply unavailable in many deployment scenarios. This paper presents an accurate, cheap and scalable framework, called Sextant, for determining node position and event location in sensor networks. Sextant operates by setting up and solving a system of geographic constraints based on connectivity information from the underlying communication network. Sextant achieves high accuracy by enabling non-convex constraints to be used to refine position estimates. It represents position estimates as potentially noncontiguous collections of points. This general representation enables Sextant to use negative information, that is, information on where a node or event is not located, to refine location estimates. Sextant unifies both node and event detection within the same general framework. It can provide high precision without dedicated localization hardware by aggressively extracting constraints from the link layer, representing areas precisely with Bezier-enclosed polygons and probability distributions, and using event detection to refine node position estimates. A compact representation and a fully distributed implementation make the framework practical for resource-limited devices. The framework has been implemented, deployed and tested on laptops, PDAs and Mica-2 motes. Physical experiments show that a large number (98%) of the nodes in a network can determine their positions based on a small number (30%) of landmark nodes and that a large number (90%) of events can be located with low median error.
Traditional operating system interfaces and network protocol implementations force system state to be kept on both sides of a connection. Such state ties the connection to an endpoint, impedes transparent failover, permits denial-of-service attacks, and limits scalability. This paper introduces a novel TCP-like transport protocol and a new interface to replace sockets that together enable all state to be kept on one endpoint, allowing the other endpoint, typically the server, to operate without any per-connection state. Called Trickles, this approach enables servers to scale well with increasing numbers of clients, consume fewer resources, and better resist denial-of-service attacks. Measurements on a full implementation in Linux indicate that Trickles achieves performance comparable to TCP/IP, interacts well with other flows, and scales well. Trickles also enables qualitatively different kinds of networked services. Services can be geographically replicated and contacted through an anycast primitive for improved availability and performance. Widely-deployed practices that currently have client-observable side effects, such as periodic server reboots, connection redirection, and failover, can be made transparent, and perform well, under Trickles. The protocol is secure against tampering and replay attacks, and the client interface is backwards-compatible, requiring no changes to sockets-based client applications.
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multi-resolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growth-constrained metric, a low-dimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million node-pairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
This paper describes Credence, a distributed object reputation management scheme for combating content pollution in peer-to-peer filesharing systems. Credence enables honest peers to assess the authenticity of online content by securely tabulating and managing endorsements from other peers. Credence employs a novel voter correlation scheme to weight peer opinions, which gives rise to favorable incentives and system dynamics. We present simulation results indicating that our system is scalable, efficient, and robust.
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multi-resolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growth-constrained metric, a low-dimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million node-pairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
We present a simple rate matching-based mechanism for voltage adaptation in a microprocessor running a multiprogrammed workload. The mechanism incorporates a set of architecture and operating system extensions through which applications can communicate their actual and desired progress to the operating system. Using this feedback, the operating system uses a modified scheduling algorithm to run all applications at a single, globallyoptimal voltage. We demonstrate that significant energy savings are possible with a simple, practical set of extensions to the architecture and operating system.
Anonymity is increasingly important for networked applications amidst concerns over censorship and privacy. This paper outlines the design of HerbivoreFS, a scalable and efficient file sharing system that provides strong anonymity. HerbivoreFS provides computational guarantees that even adversaries able to monitor all network traffic cannot deduce the identity of a sender or receiver beyond an anonymizing clique of k peers. HerbivoreFS achieves scalability by partitioning the global network into smaller anonymizing cliques. Measurements on PlanetLab indicate that the system achieves high anonymous bandwidth when deployed on the Internet.
Name services are critical for mapping logical resource names to physical resources in large-scale distributed systems. The Domain Name System (DNS) used on the Internet, however, is slow, vulnerable to denial of service attacks, and does not support fast updates. These problems stem fundamentally from the structure of the legacy DNS. This paper describes the design and implementation of the Cooperative Domain Name System (CoDoNS), a novel name service, which provides high lookup performance through pro- active caching, resilience to denial of service attacks through automatic load-balancing, and fast propagation of updates. CoDoNS derives its scalability, decentralization, self-organi- zation, and failure resilience from peer-to-peer overlays, while it achieves high performance using the Beehive replication framework. Cryptographic delegation, instead of host-based physical delegation, limits potential malfeasance by names- pace operators and creates a competitive market for names- pace management. Backwards compatibility with existing protocols and wire formats enables CoDoNS to serve as a backup for legacy DNS, as well as a complete replacement. Performance measurements from a real-life deployment of the system in PlanetLab shows that CoDoNS provides fast lookups, automatically reconfigures around faults without man- ual involvement and thwarts distributed denial of service at- tacks by promptly redistributing load across nodes.
A secure, tamperproof execution environment is critical for trustworthy network computing. Newly emerging hardware, such as those developed as part of the TCPA and Palladium initiatives, enables operating systems to implement such an environment through Merkle hash trees. We examine the selection of optimal parameters, namely blocksize and tree depth, forMerkle hash trees based on the size of the memory region to be protected and the number of memory updates between updates of the hash tree. We analytically derive an expression for the cost of updating the hash tree, show that there is an optimal blocksize for the leaves of a Merkle tree for a given filesize and update interval that minimizes the cost of update operations, and describe a general method by which the parameters of such a tree can be determined optimally.
This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to restructure discrete event simulators to operate in stages that precompute, cache, and reuse partial results to drastically reduce redundant computation within and across simulations. We present a general and flexible framework for staging and identify the advantages and trade-offs of its application to wireless network simulations, a particularly challenging simulation domain. Experience with applying staged simulation to the ns2 simulator shows that staging can improve execution time by an order of magnitude or more and enable the simulation of wireless networks with tens of thousands of nodes.
Structured peer-to-peer hash tables provide decentralization, self-organization, failure-resilience, and good worst-case lookup performance for applications, but suffer from high latencies (O(logN)) in the average case. Such high latencies prohibit them from being used in many relevant, demanding applications such as DNS. In this paper, we present a proactive replication framework that can provide constant lookup performance for common Zipf-like query distributions. This framework is based around a closed-form optimal solution that achieves O(1) lookup performance with low storage requirements, bandwidth overhead and network load. Simulations show that this replication framework can realistically achieve good latencies, outperform passive caching, and adapt efficiently to sudden changes in object popularity, also known as flash crowds. This framework provides a feasible substrate for high-performance, low-latency applications, such as peer-to-peer domain name service.
CorSSO is a distributed service for authentication in networks. It allows application servers to delegate client identity checking to combinations of authentication servers potentially residing in separate administrative domains. In CorSSO, authentication policies enable the system to tolerate expected classes of attacks and failures. A novel partitioning of the work associated with authentication of principals means that the system scales well with increases in the numbers of users and services.
High lookup latencies prohibit peer-to-peer overlays from being used in many performance intensive applications, even though they provide self-organization, scalability, and failure resilience. In this paper, we show that lookup performance of structured DHTs can be improved to any desired constant, even under a single hop, by controlled proactive replication. By exploiting the popularity distribution of objects, we can minimize the number of replicas and reduce the storage and bandwidth cost of replication. This enables structured DHTs to efficiently support a wide variety of latency sensitive applications. We describe three different applications, namely DNS, web access, and content distribution, and show how they can derive significant performance gains by using DHTs.
This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical wireless network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to reduce the amount of redundant computation within a simulation by restructuring discrete event simulators to operate in stages that precompute, cache, and reuse partial results. This paper presents a general and flexible framework for staging, and identifies the advantages and trade-offs of its application to wireless network simulations. Experience with applying staged simulation to the ns2 simulator shows that it can improve execution time by an order of magnitude in typical scenarios and make feasible the simulation of large scale wireless networks.
A central challenge in ad hoc networks is the design of routing protocols that can adapt their behavior to frequent and rapid changes in the network. The performance of proactive and reactive routing protocols varies with network characteristics, and one protocol may outperform the other in different network conditions. The optimal routing strategy depends on the underlying network topology, rate of change, and traffic pattern, and varies dynamically. This paper introduces the Sharp Hybrid Adaptive Routing Protocol (SHARP), which automatically finds the balance point between proactive and reactive routing by adjusting the degree to which route information is propagated proactively versus the degree to which it needs to be discovered reactively. SHARP enables each node to use a different application-specific performance metric to control the adaptation of the routing layer. This paper describes application-specific protocols built on top of SHARP for minimizing packet overhead, bounding loss rate, and controlling jitter. Simulation studies show that the resulting protocols outperform the purely proactive and purely reactive protocols across a wide range of network characteristics.
Peer-to-peer systems are typically designed around the assumption that all peers will willingly contribute resources to a global pool. They thus suffer from freeloaders, that is, participants who consume many more resources than they contribute. In this paper, we propose a general economic framework for avoiding freeloaders in peer-to-peer systems. Our system works by keeping track of the resource consumption and resource contribution of each participant. The overall standing of each participant in the system is represented by a single scalar value, called their karma. A set of nodes, called a bankset, keeps track of each nodes karma, increasing it as resources are contributed, and decreasing it as they are consumed. Our framework is resistant to malicious attempts by the resource provider, consumer, and a fraction of the members of the bank set. We illustrate the application of this framework to a peer-to-peer filesharing application.
In this paper, we describe three novel analyses for eliminating unnecessary synchronization that remove over 70% of dynamic synchronization operations on the majority of our 15 benchmarks and improve the bottom-line performance of three by 37-53%. Our analyses attack three frequent forms of unnecessary synchronization: thread-local synchronization, reentrant synchronization, and enclosed lock synchronization. We motivate the design of our analyses with a study of the kinds of unnecessary synchronization found in a suite of single- and multithreaded benchmarks of different sizes and drawn from a variety of domains. We analyze the performance of our optimizations in terms of dynamic operations removed and run-time speedup. We also show that our analyses may enable the use of simpler synchronization models than the model found in Java, at little or no additional cost in execution time. The synchronization optimizations we describe enable programmers to design efficient, reusable and maintainable libraries and systems in Java without cumbersome manual code restructuring.
Anonymity is increasingly important for networked applications amidst concerns over censorship and privacy. In this paper, we describe Herbivore, a peer-to-peer, scalable, tamper-resilient communication system that provides provable anonymity and privacy. Building on dining cryptographer networks, Herbivore scales by partitioning the network into anonymizing cliques. Adversaries able to monitor all network traffic cannot deduce the identity of a sender or receiver beyond an anonymizing clique. In addition to strong anonymity, Herbivore simultaneously provides high efficiency and scalability, distinguishing it from other anonymous communication protocols. Performance measurements from a prototype implementation show that the system can achieve high bandwidths and low latencies when deployed over the Internet.
Topological changes in mobile ad hoc networks frequently render routing paths unusable. Such recurrent path failures have detrimental effects on the network ability to support QoS-driven services. A promising technique for addressing this problem is to use multiple redundant paths between the source and the destination. However, while multipath routing algorithms can tolerate network failures well, their failure resilience only holds if the paths are selected judiciously. In particular, the correlation between the failures of the paths in a redundant path set should be as small as possible. However, selecting an optimal path set is an NPcomplete problem. Heuristic solutions proposed in the literature are either too complex to be performed in real-time, or too ineffective, or both. This paper proposes a multipath routing algorithm, called Disjoint Pathset Selection Protocol (DPSP), based on a novel heuristic that, in nearly linear time on average, picks a set of highly reliable paths. The convergence to a highly reliable path set is very fast, and the protocol provides flexibility in path selection and routing algorithm. Furthermore, DPSP is suitable for real-time execution, with nearly no message exchange overhead and with minimal additional storage requirements. This paper presents evidence that multipath routing can mask a substantial number of failures in the network compared to single path routing protocols, and that the selection of paths according to DPSP can be beneficial for mobile ad hoc networks, since it dramatically reduces the rate of route discoveries.
This paper presents an approach for formally specifying and enforcing security policies on web service implementations. Networked services in general, and web services in particular, require extensive amounts of code to ensure that clients respect siteintegrity constraints. We provide a language by which these constraints can be expressed and enforced automatically, portably and efficiently. Security policies in our system are specified in a language based on temporal logic, and are processed by an enforcement engine to yield site and platform-specific access control code. This code is integrated with a web server and platformspecific libraries to enforce the specified policy on a given web service. Our approach decouples the security policy specification from service implementations, provides a mandatory access control model for web services, and achieves good performance. We show that up to 22% of the code in a traditional web service module is dedicated to security checking functionality, including checks for client sequencing and parameter validation. We show that our prototype language implementation, WebGuard, enables web programmers to significantly reduce the amount of security checking code they need to develop manually. The quality of the code generated by WebGuard from formal policy specifications is competitive with the latency of handcrafted code to within a few percent.
Ad hoc and sensor networks are an important, emerging niche that is poorly supported by existing operating systems. In this paper, we argue that network-wide energy management is a primary concern in ad hoc networks, and that this functionality is best provided by a systems layer. We are currently designing and implementing a distributed, power-aware, adaptive operating system, called MagnetOS, specifically targeting ad hoc and sensor networks. MagnetOS provides a single system image of a unified Java virtual machine across the nodes that comprise an ad hoc network. By automatically and transparently partitioning applications into components and dynamically placing these components on nodes within the ad hoc network, our system reduces energy consumption, avoids hotspots and increases system longevity. We show that a systems approach to automatic object placement in an ad hoc network can increase system longevity by a factor of four to five.
In this paper, we describe PortOS, an educational operating system designed to complement undergraduate and graduate level classes on operating systems. PortOS is a complete user-level operating system project, with phases covering concurrency, synchronization, networking and file systems. It focuses particularly on ad-hoc and peer-to-peer distributed computing on mobile devices. This paper discusses alternative approaches to operating system projects, and presents our particular design point along with pedagogical justifications.
Anonymity is critical for many networked applications. Yet current Internet protocols provide no support for masking the identity of communication endpoints. This paper outlines a design for a peer-to-peer, scalable, tamper-resilient communication protocol that provides strong anonymity and privacy. Called CliqueNet, our protocol provides an information-theoretic guarantee: an omnipotent adversary that can wiretap at any location in the network cannot determine the sender of a packet beyond a clique, that is, a set of k hosts, where k is an anonymizing factor chosen by the participants. CliqueNet is resilient to jamming by malicious hosts and can scale with the number of participants. This paper motivates the need for an anonymous communication layer and describes the self-organizing, novel divide-and-conquer approach that enables CliqueNet to scale while offering a strong anonymity guarantee. CliqueNet is widely applicable as a communication substrate for peer-to-peer applications that require anonymity, privacy and anti-censorship guarantees.
Developing applications for ad-hoc and sensor networks poses significant challenges. Many interesting applications in these domains entail collaboration between components distributed throughout an ad-hoc network. Defining these components, optimally placing them on nodes in the ad-hoc network and relocating them in response to changes is a fundamental problem faced by such applications. Manual approaches to code and data migration are not only platform-dependent and error-prone, but also needlessly complicate application development. Further, locally optimal decisions made by applications that share the same network can lead to globally unstable and energy inefficient behavior. In this paper we describe the design and implementation of a distributed operating system for adhoc and sensor networks whose goal is to enable power-aware, adaptive, and easy-to-develop adhoc networking applications. Our system achieves this goal by providing a single system image of a unified Java virtual machine to applications over an ad-hoc collection of heterogeneous nodes. It automatically and transparently partitions applications into components and dynamically finds a placement of these components on nodes within the ad-hoc network to reduce energy consumption and increase system longevity. This paper outlines the design of our system and evaluates two practical, power-aware, online algorithms for object placement that form the core of our system. We demonstrate that our algorithms can increase system longevity by a factor of four to five by effectively distributing energy consumption, and are suitable for use in an energy efficient operating system in which applications are distributed automatically and transparently.
This paper describes the motivation, architecture and performance of a distributed virtual machine (DVM) for networked computers. DVMs rely on a distributed service architecture to meet the manageability, security and uniformity requirements of large, heterogeneous clusters of networked computers. In a DVM, system services, such as verification, security enforcement, compilation and optimization, are factored out of clients and located on powerful network servers. This partitioning of system functionality reduces resource requirements on network clients, improves site security through physical isolation and increases the manageability of a large and heterogeneous network without sacrificing performance. Our DVM implements the Java virtual machine, runs on x86 and DEC Alpha processors and supports existing Javaenabled clients.
The Java Virtual Machine (JVM) has emerged as a ubiquitous platform for network computing. JVMs have been incorporated into browsers, web servers, database engines, and personal digital assistants, and there are plans under way to use them on embedded devices and smartcards. The adoption of JVMs into such diverse and security critical domains requires that their safety and security be tested rigorously. Unfortunately, Java virtual machines are large, complex, and have many subtleties, and consequently testing them is a difficult task. Commercial virtual machines deployed so far have had to rely on manual and ad hoc techniques for testing, and have subsequently exhibited substantial weaknesses that could lead to information theft or destruction. In this paper, we describe our experience with automatically testing Java virtual machines. We outline two effective automatic testing techniques for JVMs. The first technique is comparative evaluation with mutations, where randomly perturbed test inputs are used to identify discrepancies between different versions of JVMs. We show that this fast and effective technique achieves broad code coverage, and discuss its shortcomings. Second, we present a well-structured technique for generating complex test cases from cogent grammar descriptions. We describe lava, a special purpose language we developed to specify production grammars for testing, and show that grammar-based test generation can produce very complex test cases from compact specifications. The testing process is easy to steer, and can generate targeted test cases. Most importantly, grammars enable the tester to reason about the expected system behavior on the test inputs. We show how grammars can be used in conjunction with inductive proofs to solve the oracle problem, and describe various application areas for grammar based testing. Finally, we report the results of applying these techniques to commercial Java virtual machine implementations.
Extensible typesafe systems, such as Java, rely critically on a large and complex software base for their overall protection and integrity, and are therefore difficult to test and verify. Traditional testing techniques, such as manual test generation and formal verification, are too time consuming, expensive, and imprecise, or work only on abstract models of the implementation and are too simplistic. Consequently, commercial virtual machines deployed so far have exhibited numerous bugs and security holes. In this paper, we discuss our experience with using production grammars in testing large, complex and safety-critical software systems. Specifically, we describe lava, a domain specific language we have developed for specifying production grammars, and relate our experience with using lava to generate effective test suites for the Java virtual machine. We demonstrate the effectiveness of production grammars in generating complex test cases that can, when combined with comparative and variant testing techniques, achieve high code and value coverage. We also describe an extension to production grammars that enables concurrent generation of certificates for test cases. A certificate is a behavioral description that specifies the intended outcome of the generated test case, and therefore acts as an oracle by which the correctness of the tested system can be evaluated in isolation. We report the results of applying these testing techniques to commercial Java implementations. We conclude that the use of production grammars in combination with other automated testing techniques is a powerful and effective method for testing software systems, and is enabled by a special purpose language for specifying extended production grammars.
This paper presents and evaluates a set of analyses designed to reduce synchronization overhead in Java programs. Monitor-based synchronization in Java often causes significant overhead, accounting for 5-10% of total execution time in our benchmark applications. To reduce this overhead, programmers often try to eliminate unnecessary lock operations by hand. Such manual optimizations are tedious, error-prone, and often result in poorly structured and less reusable programs. Our approach replaces manual optimizations with static analyses that automatically find and remove unnecessary synchronization from Java programs. These analyses optimize cases where a monitor is entered multiple times by a single thread, where one monitor is nested within another, and where a monitor is accessible by only one thread. A partial implementation of our analyses eliminates up to 70% of synchronization overhead and improves running time by up to 5% for several already hand-optimized benchmarks. Thus, our automated analyses have the potential to significantly improve the performance of Java applications while enabling programmers to design simpler and more reusable multithreaded code.
In emerging application domains, such as thin client computing and hypertext systems with embedded objects (e.g. the World Wide Web), the process of downloading application code is in the critical path of users. We observe that in these domains, compliance with existing standards and minimizing the impact on the clients is crucial. We argue that the fundamental problem for mobile code is that the units of code distribution in networked object systems, such as Java, are not suited for efficient utilization of network bottlenecks. In this paper, we propose a separate optimization step, between compilation and loading, whereby application code is restructured to more effectively use the available network bandwidth for program download. We have designed and implemented such an optimization step as a binary rewriting service for Java applets and applications. Our implementation does not require any modifications to existing Java virtual machines, compilers or clients. We have found that restructuring of Java applications can improve program startup times by up to 30%.
Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While todays virtual machines provide higher-level abstractions and more sophisticated services than their predecessors, and while they have migrated from dedicated mainframes to heterogeneous networked computers, their architecture has essentially remained intact. State of the art virtual machines are still monolithic, that is, all system components reside on the same host and are replicated among all clients in an organization. This crude replication of services among clients creates problems of security, manageability, performance and scalability. We propose a distributed architecture for virtual machines based on distributed service components. In our proposed system, services that control security, resource management, and code optimization are factored out of clients and reside in enterprisewide network servers. The services produce self-certifying, self-regulating, selfoptimizing programs via binary rewriting. We are currently building a Java virtual machine based on this architecture. We argue that distributed virtual machine architectures enable higher integrity, manageability, performance and scalability than monolithic virtual machines where all components reside on all clients.
Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While these virtual machines provide higher-level abstractions and more sophisticated services than their predecessors from twenty years ago, their architecture has essentially remained unchanged. State of the art virtual machines are still monolithic, that is, they are comprised of closely-coupled service components, which are thus replicated over all computers in an organization. This crude replication of services forms one of the weakest points in todays networked systems, as it creates widely acknowledged and well-publicized problems of security, manageability and performance. We have designed and implemented a new system architecture for network computing based on distributed virtual machines. In our system, virtual machine services that perform rule checking and code transformation are factored out of clients and are located in enterprisewide network servers. The services operate by intercepting application code and modifying it on the fly to provide additional service functionality. This architecture reduces client resource demands and the size of the trusted computing base, establishes physical isolation between virtual machine services and creates a single point of administration. We demonstrate that such a distributed virtual machine architecture can provide substantially better integrity and manageability than a monolithic architecture, scales well with increasing numbers of clients, and does not entail high overhead.
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applica- tions to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and function- ality. SPIN uses language and link-time mechanisms to inexpensively export fine-grained interfaces to operat- ing system services. Extensions are written in a type safe language, and are dynamically linked into the op- erating system kernel. This approach offers extensions rapid access to system services, while protecting the op- erating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations.
Application domains such asmultimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorly matched by the services provided. The SPIN operating system enables system services to be defined in an application-specific fashion through an extensible microkernel. It offers applications fine-grained control over a machines logical and physical resources through run-time adaptation of the system to application requirements.
Application domains such asmultimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorly matched by the services provided. The SPIN operating system enables system services to be defined in an application-specific fashion through an extensible microkernel. It offers applications fine-grained control over a machines logical and physical resources through run-time adaptation of the system to application requirements.
Application domains, such as multimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorlymatched by the services provided. The SPIN operating systemenables system services to be defined in an application-specific fashion through an extensible microkernel. It offers fine-grained control over a machines logical and physical resources to applications through run-time adaptation of the system to application requirements.
Some recent talks appear below. A full listing of all my talks can be found here.
| A new bibliographic management tool to replace BibTeX | |
| An optimal failure detector implementation | |
| An open service that maps a client to the nearest server | |
| An open-access content distribution network, similar to Akamai | |
| LimeWire client for filesharing with reputation management | |
| Lightweight, scalable network positioning | |
| Replacement and safety net for DNS | |
| High performance, scalable wireless network simulator | |
| An instructional OS platform for Windows | |
| Java verification and disassembly | |
| Extensible typesafe standalone operating system | |
| MIPS simulator | |
| Dive computer for X windows | |
| At the request of the IPTPS steering committee, I am keeping a mirror of past IPTPS workshops. |
| ACM Symposium on Networked System Design and Implementation, 2008, Co-chair. | |
| Symposium on Principles of Distributed Computing, 2008. | |
| ACM Symposium on Networked System Design and Implementation, 2008. | |
| European Conference on Computer Systems, 2008. | |
| ACM Symposium on Networked System Design and Implementation, 2007. | |
| Usenix Annual Technical Conference, 2007. | |
| ACM SIGCOMM Conference, 2006. | |
| Workshop on the Economics of Networked Systems, 2006 | |
| Conference on Middleware, 2006. | |
| The First EuroSys Authoring Workshop, 2006, Organizing Committee. | |
| ACM Symposium on Networked System Design and Implementation, 2006. | |
| The Fifth International Workshop on Peer-to-Peer Systems, 2006, Co-Chair. | |
| Third Workshop on Economics of Peer-to-Peer Systems, 2005, Co-Chair. | |
| ACM Symposium on Operating System Principles, Operating Systems Session Chair, 2005. | |
| IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2005. | |
| The Fourth International Workshop on Peer-to-Peer Systems, 2005. | |
| International Conference on Distributed Computing Systems, 2005. | |
| ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2005. | |
| IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2004. | |
| Workshop on the Economics of Peer-to-Peer Systems, 2004. |
|
|
Directions to the Cornell CS Department.
I enjoy sailing (I14s, sailboards), backcountry skiing and photography.