|
EMİN GÜN SİRER | |
|
Associate Professor
4151 Upson Hall |
(607) 255-7673 (607) 255-4428 fax Turn on JavaScript to view email address Twitter: el33th4xor Blog: Hacking, Distributed |
|
My research spans operating systems, networking and distributed systems. My current projects involve a novel secure operating system and system infrastructure for high-performance cloud computing applications. I like building things, especially systems that have some principled reason for why they should work.
Here is some information for prospective students interested in working with me.
Recent Work | |
| HyperDex | HyperDex is a new NoSQL key-value store. It is fast, consistent, and fault tolerant, with a rich API that includes an efficient search primitive. It is significantly faster than Cassandra, MongoDB and Redis; scales well; and provides a level of consistency and fault tolerance not found in other systems. |
| OpenReplica | OpenReplica is a service for replicating objects based on our ConCoord Paxos imlementation for replica coordination. The OpenReplica service enables anyone to quickly and easily replicate objects, to locate replicas through DNS, and to dynamically modify the replica set. It's comparable to Yahoo's ZooKeeper and Google's Chubby, except we run the system as a service, therefore anyone can deploy replicated objects simply by uploading a Python object. OpenReplica achieves performance that is comparable to or better than ZooKeeper for less than 6 replicas. |
| Nexus | I am building a new operating system called Nexus. Nexus introduces a new driver architecture, new abstractions and new mechanisms that enable secure, trustworthy applications. The system boots standalone on x86 platforms ideally equipped with TPMs; it can send mail, play movies, run cloud applications and execute Linux programs. |
Peer-to-peer Systems | |
| Cubit | Cubit is a peer-to-peer overlay that enables approximate searches in large networks with no centralized components. Given a potentially misspelled keyword, Cubit finds all objects containing that keyword. The project was inspired by the various legal attacks and attempts to take down torrent sites. The key idea behind Cubit is to construct a metric space for keywords, map it onto nodes in a small world, and then to traverse the nodes using a greedy routing algorithm. We have implemented Cubit as an Azureus plugin. |
| Octant | Octant is a system for determining the physical location of Internet hosts. Given a host, Octant determines the boundaries of the region in which the node is likely to lie. Behind the scenes, Octant consists of two parts: a comprehensive framework for efficiently representing and combining a system of constraints, and a set of mechanisms for extracting tight constraints on node locations without resulting in an overconstrained system. |
| Meridian | Meridian is a peer-to-peer overlay network for performing location-aware node and path selection in large-scale distributed systems. It is simple to deploy, robust to churn, and can accurately find the nearest node, pick the most centrally placed node, and find a node that fits latency constraints. |
| Beehive CoDoNS CobWeb CorONA |
Beehive is a high-performance distributed hash table. A novel optimization technique enables Beehive to respond to queries quickly, tolerate denial of service attacks, and balance load. Beehive provides strong performance guarantees even in the presence of queries drawn from Power Law distributions, previously thought to be difficult because heuristics-based approaches tried in the past do not work well with such distributions. We have used Beehive to build new, resilient infrastructure services for the Internet. CoDoNs is a safety net and a replacement for the Domain Name System that provides strong security, performance, and fast dynamic updates for existing Internet names. CobWeb is an Akamai-like open-access content distribution network. CorONA is a high-performance publish-subscribe system for web micronews. |
| Sqrt(S) |
|
| Credence | Credence is a reputation system for peer-to-peer networks, designed to provide an accurate metric for the trustworthiness of labels associated with shared files. It differs from previous work in that it derives its trust metric from principled measures that reflect likelihood of similar behavior between peers, has a completely distributed architecture with no trusted nodes, and a concrete implementation. Credence can guard against Sybil attacks and other malicious behavior from spammers. The Credence implementation is free, open-source, and backwards compatible with Gnutella. |
| CorSSO | CorSSO is a distibuted authentication service that provides network identities that span multiple application services, also known as single sign-on. It enables authentication functionality to be factored out of application services and delegated to combinations of authentication servers. It uses threshold cryptography for efficiency, fault tolerance and resiliance against attackers. |
| Karma | Karma is a virtual currency for use in keeping track of users' resource contribution and consumption in peer-to-peer systems. The karma system provides a secure exchange mechanism for a self-regulating, incentive-compatible, decentralized currency. |
| Herbivore | Herbivore is a peer-to-peer, self-organizing, robust system for anonymous communication. It uses dining cryptographer networks to provide anonymity guarantees even in the presence of attackers which can eavesdrop on every packet in the network. It's a follow-on to the CliqueNet project. |
Operating Systems and Virtual Machines | |
| Nexus | I am building a new operating system called Nexus. Nexus introduces a new driver architecture, new abstractions and new mechanisms that enable secure, trustworthy applications. The system boots standalone on x86 platforms ideally equipped with TPMs; it can send mail, play movies, run cloud computing applications and execute Linux programs. |
| Trickles | Trickles is a high-performance protocol for stateless, connection-oriented communication. It comprises a transport protocol to replace TCP and a new interface to replace sockets that allow all server-side state to be shipped to clients. This leads to applications that are more scalable and robust against denial-of-service attacks. And it enables a new class of services: since a stateless network stack allows packets to be serviced at any server replica regardless of past history, Trickles enables fast transparent failover, fine-grain load balancing and connection-oriented anycast services to be implemented transparently inside the network fabric. |
| Portos | Portos is an emulation-based system and a set of corresponding projects that I use to teach introductory-level operating systems. The projects cover traditional topics, as well as new issues (such as routing and self-organization) raised by mobile, ubiquitous computing. The base emulation framework emulates a virtual processor, with attached virtual devices, on Windows NT (including 98/ME/XP/etc.) and CE. It runs on x86 and StrongArm processors. It's freely available. |
| Kimera |
The goal of the Kimera project is to enable networks of computers that
are cheaper, more secure and more manageable than what we have
now. The problem with current state of the art systems, like Java, is that network clients are monolithic, that is, they implement all requisite
system services locally. Consequently, every endpoint needs to
have sufficient resources to support services like verification,
compilation and security management, which require too much memory
and processing power for embedded devices. Further, each endpoint
entails associated software state that is hard to secure and manage.
These problems become particularly acute as the number of clients
increases.
The Kimera project addresses these problems by factoring system components out of clients into network servers. The clients can thus be smaller because they do not have to support complex services locally. Further, the overall network is easier to manage because service functionality is centralized. The services operate by intercepting applications as they are fetched by clients; they inspect applications, and where necessary, inject code snippets into them to provide the requisite service functionality. Digital signatures ensure that only applications that have been vetted by centralized services execute on the clients. |
| SPIN |
The goal of the SPIN project is to enable applications to customize
operating system functionality by downloading application-specific
extensions dynamically into the kernel. The extensions, as well as the
core system, are written in Modula-3, whose type-safety ensures load
and store protection. A namespace management interface restricts the
interfaces that applications can access, and creates multiple
software protection domains within a single privileged address space.
The SPIN execution model enables applications to safely extend the kernel with thread packages and hierarchical scheduling policies. A combination of language and system mechanisms provide security and fault-isolation while mediating access to processors. The SPIN protection domain interface simultaneously allows isolation and safe fine-grain sharing within a privileged address space. Extensions reside in hierarchical namespaces which they can use to share or to hide code and data at the granularity of interfaces. Use of hierarchical capabilities simplifies security management. Extensions that want to safely share code and data can do so without dynamic protection enforcement overhead. The SPIN web server uses all of these mechanisms to implement high-performance web service. |
| MIPSI | MIPSI is a robust and extensive MIPS instruction set simulator. It has been used in many classes and research projects. |
Ad Hoc Networks | |
| MagnetOS | MagnetOS is an operating system for ad hoc networks. It makes the entire network appear as a single Java virtual machine. It enables applications to be constructed easily and to execute efficiently. |
| Sextant | Sextant is a general framework for discovering the location of nodes and events in wireless networks. Sextant enables nodes without specialized hardware, such as GPS, to efficiently discover their approximate locations. Further, given a sensor network where each node can sense events in its immediate vicinity, Sextant pinpoints the location of events with high fidelity. |
| SHARP | SHARP is a hybrid routing protocol that dynamically finds the optimal mix of proactive route dissemination and reactive route discovery to achieve application-specific performance goals. |
| SNS | SNS is a scalable, high-performance wireless network simulator, based on ns2. It vastly outperforms standard ns2 in speed and scale through a new technique we developed called staged simulation. It achieves its improvements in speed and scale by eliminating redundant computations both within a single and across multiple simulation runs. It has been carefully validated against ns2 - staging preserves accuracy while speeding up simulations. |
| Fall 2012 | CS 4410, Operating Systems | Introduction to operating systems. |
| The Cornell Systems Lunch | We will be reading recent papers on systems topics. |
Previous courses I have taught.
Efficient content distribution in large networks comprising datacenters, end hosts, and distributed in-network caches is a difficult problem. Existing systems rely on mechanisms and metrics that fail to effectively utilize all available sources of bandwidth in the network. This paper presents a novel metric, called the Content Propagation Metric (CPM), for quantitatively evaluating the marginal benefit of available bandwidth to competing consumers, enabling efficient utilization of the bandwidth resource. The metric is simple to implement, imposes only a modest overhead, and can be retrofitted easily into existing content distribution systems. We have designed and implemented a high-performance content distribution system, called V-Formation, based on the CPM. The CPM guides V-Formation toward a global allocation of bandwidth that maximizes the aggregate download bandwidth of consumers. Results from a PlanetLab deployment and extensive simulations show that V-Formation achieves high aggregate bandwidth and that the CPM enables hosts to converge quickly on a stable allocation ofresources in a wide range of deployment scenarios.
This paper presents the design and implementation of NetQuery, a knowledge plane for federated networks such as the Internet. In such networks, not all administrative domains will generate information that an application can trust and many administrative domains may have restrictive policies on disclosing network information. Thus, both the trustworthiness and accessibility of network information pose obstacles to effective reasoning. NetQuery employs trustworthy computing techniques to facilitate reasoning about the trustworthiness of information contained in the knowledge plane while preserving confidentiality guarantees for operator data. By characterizing information disclosure between operators, NetQuery enables remote verification of advertised claims and contractual stipulations; this enables new applications because network guarantees can span administrative boundaries. We have implemented NetQuery, built several NetQuery-enabled devices, and deployed applications for cloud datacenters, enterprise networks, and the Internet. Simulations, testbed experiments, and a deployment on a departmental network indicate NetQuery can support hundreds of thousands of operations per second and can thus scale to large ISPs.
Nexus Authorization Logic (NAL) provides a principled basis for specifying and reasoning about credentials and authorization policies. It extends prior access control logics that are based on śaysánd śpeaks foróperators. NAL enables authorization of access requests to depend on (i) the source or pedigree of the requester, (ii) the outcome of any mechanized analysis of the requester, or (iii) the use of trusted software to encapsulate or modify the requester. To illustrate the convenience and expressive power of this approach to authorization, a suite of document-viewer applications was implemented to run on the Nexus operating system. One of the viewers enforces policies that concern the integrity of excerpts that a document contains; another viewer enforces confidentiality policies specified by labels tagging blocks of text.
This paper examines an extreme point in the design space of programmable switches and network policy enforcement. Rather than relying on extensive changes to switches to provide more programmability, SideCar distributes custom processing code between shims running on every end host and general purpose sidecar processors, such as server blades, connected to each switch via commonly available redirection mechanisms. This provides applications with pervasive network instrumentation and programmability on the forwarding plane. While not a perfect replacement for programmable switches, this solves several pressing problems while requiring little or no change to existing switches. In particular, in the context of public cloud data centers with thousands of tenants, we present novel solutions for multicast, controllable network bandwidth allocation (e.g., use-what-you-pay-for), and reachability isolation (e.g., a tenantś VM only sees other VMs of the tenant and shared services).
Existing content aggregators provide fast and efficient access to large volumes of shared data and serve as critical centralized components of many peer-to-peer systems, including content discovery for BitTorrent. These aggregatorsóperators are tasked to spend significant human resources to manually vet uploaded data to ensure compliance with copyright laws. This task does not scale with today’s increasing demand for such services. In this paper, we introduce Blindfold, a scheme to ensure that the operators of content aggregators are completely blind to the content that they are storing and serving, thereby eliminating the possibility to censor content at the servers. It works by partitioning the search and upload operations into a series of dependent key-value operations across servers under different administrative domains, with the connection between servers obfuscated using captchas. We have implemented a prototype of Blindfold to show that it is a simple, feasible, and efficient system for serving content that is opaque to the storage servers.
This paper describes Antfarm, a content distribution system based on managed swarms. A managed swarm couples peer-to-peer data exchange with a coordinator that directs bandwidth allocation at each peer. Antfarm achieves high throughput by viewing content distribution as a global optimization problem, where the goal is to minimize download latencies for participants subject to bandwidth constraints and swarm dynamics. The system is based on a wire protocol that enables the Antfarm coordinator to gather information on swarm dynamics, detect misbehaving hosts, and direct the peers’ allotment of upload bandwidth among multiple swarms. Antfarm’s coordinator grants autonomy and local optimization opportunities to participating nodes while guiding the swarms toward an efficient allocation of resources. Extensive simulations and a PlanetLab deployment show that the system can significantly outperform centralized distribution services as well as swarming systems such as BitTorrent.
Device drivers typically execute in supervisor mode and thus must be fully trusted. This paper describes how to move them out of the trusted computing base, by running them without supervisor privileges and constraining their interactions with hardware devices. An implementation of this approach in the Nexus operating system executes drivers in user space, leveraging hardware isolation and subjecting them to reference validation. These Nexus drivers exhibit performance nearly as fast as earlier in-kernel, trusted drivers. For example, the monitored driver for an Intel e1000 Ethernet card has throughput comparable to a trusted driver for the same hardware under Linux. And a monitored driver for the Intel i810 sound card provides continuous playback. Drivers for a disk and a USB mouse have also been moved successfully to operate in Nexus user space with reference validation.
Traditional operating system interfaces and network protocol implementations force some system state to be kept on both sides of a connection. This state ties the connection to its endpoints, impedes transparent failover, permits denial-of-service attacks, and limits scalability. This article introduces a novel TCP-like transport protocol and a new interface to replace sockets that together enable all state to be kept on one endpoint, allowing the other endpoint, typically the server, to operate without any per-connection state. Called Trickles, this approach enables servers to scale well with increasing numbers of clients, consume fewer resources, and better resist denial-of-service attacks. Measurements on a full implementation in Linux indicate that Trickles achieves performance comparable to TCP/IP, interacts well with other flows, and scales well. Trickles also enables qualitatively different kinds of networked services. Services can be geographically replicated and contacted through an anycast primitive for improved availability and performance. Widely-deployed practices that currently have client-observable side effects, such as periodic server reboots, connection redirection, and failover, can be made transparent, and perform well, under Trickles. The protocol is secure against tampering and replay attacks, and the client interface is backward-compatible, requiring no changes to sockets-based client applications.
Keyword search is a critical component in most content retrieval systems. Despite the emergence of completely decentralized and efficient peer-to-peer techniques for content distribution, there have not been similarly efficient, accurate, and decentralized mechanisms for content discovery based on approximate search keys. In this paper, we present a scalable and efficient peer-to-peer system called Cubit with a new search primitive that can efficiently find the k data items with keys most similar to a given search key. The system works by creating a keyword metric space that encompasses both the nodes and the objects in the system, where the distance between two points is a measure of the similarity between the strings that the points represent. It provides a loosely-structured overlay that can efficiently navigate this space. We evaluate Cubit through both a real deployment as a search plugin for a popular BitTorrent client and a large-scale simulation and show that it provides an efficient, accurate and robust method to handle imprecise string search in filesharing applications.
You’ve heard of Moore’s Law. This paper points out some lesser-known exponential growth trends in computer science regarding power buttons, volume controls, pixel count, and sensor networking hardware, and makes some futuristic predictions.
Peer-to-peer systems, in which participants pool their resources to accomplish their goals, have become ubiquitous. Since rational peers engage in strategic behavior, much past work has examined the design of mechanisms that incentivize peers to provide resources. The predominant design paradigm to date has been tit-for-tat, in which each peer benefits from every interaction. In this paper, we discuss the limitations of the tit-for-tat design paradigm and make the case for a new approach based on seeking the globally optimal outcome known as the common good.
Existing distributed hash tables provide efficient mechanisms for storing and retrieving a data item based on an exact key, but are unsuitable when the search key is similar, but not identical, to the key used to store the data item. In this paper, we present a scalable and efficient peerto-peer system with a new search primitive that can efficiently find the k data items with keys closest to the search key. The system works via a novel assignment of virtual coordinates to each object in a high-dimensional, synthetic space such that the proximity between two points in the coordinate space is correlated with the similarity between the strings that the points represent. We examine the feasibility of this approach for efficient, peer-to-peer search on inexact string keys, and show that the system provides a robust method to handle key perturbations that naturally occur in applications, such as file-sharing networks, where the query strings are provided by users.
Determining the physical location of Internet hosts is a critical enabler for many new location-aware services. In this paper, we present Octant, a novel, comprehensive framework for determining the location of Internet hosts in the real world based solely on network measurements. The key insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a system of constraints by deriving them aggressively from network measurements, and to solve the system geometrically to yield the estimated region in which the target resides. This approach gains its accuracy and precision by taking advantage of both positive and negative constraints, that is, constraints on where the node can and cannot be, respectively. The constraints are represented using regions bounded by Bézier curves, allowing precise constraint representation and low-cost geometric operations. The framework can reason in the presence of uncertainty, enabling it to gracefully cope with aggressively derived constraints that may contain errors. An evaluation of Octant using PlanetLab nodes and public traceroute servers shows that Octant can localize the median node to within 22 mi., a factor of three better than other evaluated approaches.
Failure detectors are fundamental building blocks in distributed systems. Multi-node failure detectors, where the detector is tasked with monitoring N other nodes, play a critical role in overlay networks and peer-to-peer systems. In such networks, failures need to be detected quickly and with low overhead. Achieving these properties simultaneously poses a difficult tradeoff between detection latency and resource consumption.In this paper, we examine this central tradeoff, formalize it as an optimization problem and analytically derive the optimal closed form formulas for multi-node failure detectors. We provide two variants of the optimal solution for optimality metrics appropriate for two different deployment scenarios. Sqrt(s)-LM is a latency-minimizing optimal failure detector that achieves the lowest average failure detection latency given a fixed bandwidth constraint for system maintenance. Sqrt(s)-BM is a bandwidth-minimizing optimal failure detector that meets a desired detection latency target with the least amount of bandwidth consumed. We evaluate our optimal results with node lifetimes chosen from bimodal and Pareto distributions, as well as real-world trace data from PlanetLab hosts, web sites and Microsoft PCs. Compared to standard failure detectors in wide use, sqrt-s failure detectors reduce failure detection latencies by 40% on average for the same bandwidth consumption, or conversely, reduce the amount of bandwidth consumed by 30% for the same failure detection latency.
Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol, check the behavior of the host against a safety specification, and communicate using an overlay to alert other hosts about invalid behavior and to initiate remedial actions. Trusted computing hardware provides the basis for trust in external security monitors. This paper applies this approach to secure the Border Gateway Protocol, yielding an external security monitor called N-BGP. N-BGP can accurately monitor a BGP router using commodity trusted computing hardware. Deploying N-BGP at a random 10% of BGP routers is sufficient to guarantee the security of 80% of Internet routes where both endpoints are monitored by N-BGP. Overall, external security monitors secure the routing infrastructure using trusted computing hardware and construct a security plane for BGP without having to modify the large base of installed routers and servers.
This paper outlines a novel, comprehensive framework for geolocalization, that is, determining the physical location of Internet hosts based on network measurements. The core insight behind this framework is to pose the geolocalization problem formally as one of error-minimizing constraint satisfaction, to create a system of constraints by deriving them aggressively from network measurements, and to solve the system using cheap and accurate geometric methods. The framework is general and accommodates both positive and negative constraints, that is, constraints on where the node can or cannot be, respectively. It can reason in the presence of uncertainty, enabling it to gracefully cope with aggressively derived constraints that may contain errors. Since the solution space is represented geometrically as a region bounded by Bézier curves, the framework yields an accurate set of all points where the target may be located. Preliminary results on PlanetLab show promise; the framework can localize the median node to within 22 miles, a factor of three better than previous approaches, with little error.
The web has failed to fulfill its promise of delivering relevant news and information in a timely fashion. In fact, it does not deliver anything on its own at all; instead, it requires its users to explicitly poll information sources. Checking for updates by pointing, clicking, and reloading Web sites, whether the sites are Slashdot, news, or online classifieds, is not only slow, inefficient, and cumbersome for users, but it places an unnecessary bandwidth burden on content providers. Recent attempts to automate this process, with the aid of feed readers, have created more problems than they have solved. A system that detects updates to content anywhere on the Web and delivers it to users via an asynchronous channel, such as an instant message, would do much to relieve the burden on users and content providers alike.
Despite the abundance of frequently changing information, the Web lacks a publish-subscribe interface for delivering updates to clients. The use of naive polling for detecting updates leads to poor performance and limited scalability as clients do not detect updates quickly and servers face high loads imposed by active polling. This paper describes a novel publish-subscribe system for the Web called Corona, which provides high performance and scalability through optimal resource allocation. Users register interest in Web pages through existing instant messaging services. Corona monitors the subscribedWeb pages, detects updates efficiently by allocating polling load among cooperating peers, and disseminates updates quickly to users. Allocation of resources for polling is driven by a distributed optimization engine that achieves the best update performance without exceeding load limits on content servers. Large-scale simulations and measurements from PlanetLab deployment demonstrate that Corona achieves orders of magnitude improvement in update performance at a modest cost.
In this paper, we describe Credence, a decentralized object reputation and ranking system for large-scale peer-to-peer filesharing networks. Credence counteracts pollution in these networks by allowing honest peers to assess the authenticity of online content through secure tabulation and management of endorsements from other peers. Our system enables peers to learn relationships even in the absence of direct observations or interactions through a novel, flow-based trust computation to discover trustworthy peers. We have deployed Credence as an overlay on top of the Gnutella filesharing network, with more than 10,000 downloads of our client software to date. We describe the system design, our experience with its deployment, and results from a long-term study of the trust network built by users. Data from the live deployment shows that Credence’s flow-based trust computation enables users to avoid undesirable content. Honest Credence clients can identify three quarters of the decoys encountered when querying the Gnutella network.
ClosestNode.com is an accurate, scalable, and backwards-compatible service for mapping clients to a nearby server. It provides a DNS interface by which unmodified clients can look up a service name, and get the IP address of the closest server. A shared system for performing such a mapping amortizes the administration and implementation costs of proximity-based server selection. It is aimed at minimizing the amount of effort required for system developers to make new and existing infrastructure services proximity-aware.
This paper examines replication in content distribution networks and proposes a novel mechanism for optimally resolving performance versus cost tradeoffs. The key insight behind our work is to formally and analytically capture the relationship between performance, bandwidth overhead and storage requirements for a web cache, express the system goals as a mathematical optimization problem, and solve for the optimal extent of replication that achieves the desired system goals with minimal overhead. We describe the design and implementation of a new content distribution network based on this concept, called CobWeb. CobWeb can achieve a target lookup latency while minimizing network and storage overhead, minimize access time while keeping bandwidth usage below a set limit, and alleviate flash crowd effects by rapidly replicating popular objects through fast and highly adaptive replica management. We outline the architecture of the CobWeb system, describe its novel optimization algorithm for intelligent resource allocation, and compare, through simulations and a physical deployment on PlanetLab, CobWebś informed, analysis-driven replication strategy to existing approaches based on passive caching and heuristics.
While publish-subscribe systems have attracted much research interest since the last decade, few established benchmarks have emerged, and there has been little characterization of how publish-subscribe systems are used in practice. This paper examines RSS, a newly emerging, widely used publish-subscribe system for Web micronews. Based on a trace study spanning 45 days at a medium-size academic department and periodic polling of approximately 100,000 RSS feeds, we extract characteristics of RSS content and usage. We find that RSS workload resembles the Web in content size and popularity; feeds are typically small (less than 10KB), albeit with a heavy tail, and feed popularity follows a power law distribution. The update rate of RSS feeds is widely distributed; 55% of RSS feeds are updated hourly, while 25% show no updates for several days. And, only small portions of RSS content typically change during an update; 64% of updates involve less than three lines of the RSS content. Overall, this paper presents an analysis of RSS, the first widely deployed publish-subscribe system, and provides insights for the design of next generation publish-subscribe systems.
The Domain Name System, DNS, is based on nameserver delegations, which introduce complex and subtle dependencies between names and nameservers. In this paper, we present results from a large scale survey of DNS, and show that these dependencies lead to a highly insecure naming system. We report specifically on three aspects of DNS security: the properties of the DNS trusted computing base, the extent and impact of existing vulnerabilities in the DNS infrastructure, and the ease with which attacks against DNS can be launched. The survey shows that a typical name depends on 46 servers on average, whose compromise can lead to domain hijacks, while names belonging to some countries depend on a few hundred servers. An attacker exploiting well-documented vulnerabilities in DNS nameservers can hijack more than 30% of the names appearing in the Yahoo and DMOZ.org directories. And certain nameservers, especially in educational institutions, control as much as 10% of the namespace.
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multi-resolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growth-constrained metric, a low-dimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million node-pairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
Peer-to-peer filesharing is now commonplace and its traffic now dominates bandwidth consumption at many Internet peering points. Recent studies indicate that much of this filesharing activity involves corrupt and polluted files. This paper describes Credence, a new object-based reputation system, and shows how it can counteract content pollution in peer-to-peer filesharing networks. Credence allows honest peers to assess the authenticity of online content by securely tabulating and managing endorsements from other peers. We employ a novel voter correlation scheme to weigh the opinions of peers, which gives rise to favorable incentives and system dynamics. We present simulation results indicating that our system is scalable, efficient, and robust.
In this paper, we describe the design and implementation of a distributed operating system for ad hoc networks. Our system simplifies the programming of ad hoc networks and extends total system lifetime by making the entire network appear as a single virtual machine. It automatically and transparently partitions applications into components and dynamically finds them a placement on nodes within the network to reduce energy consumption and to increase system longevity. This paper describes our programming model, outlines the design and implementation of our system and examines the energy efficiency of our approach through extensive simulations as well as validation of a deployment on a physical testbed. We evaluate practical, power-aware, general-purpose algorithms for component placement and migration, and demonstrate that they can significantly increase system longevity by effectively distributing energy consumption and avoiding hotspots.
Determining node and event locations is a canonical task for many wireless network applications. Yet dedicated infrastructure for determining position information is expensive, energy-consuming, and simply unavailable in many deployment scenarios. This paper presents an accurate, cheap and scalable framework, called Sextant, for determining node position and event location in sensor networks. Sextant operates by setting up and solving a system of geographic constraints based on connectivity information from the underlying communication network. Sextant achieves high accuracy by enabling non-convex constraints to be used to refine position estimates. It represents position estimates as potentially noncontiguous collections of points. This general representation enables Sextant to use negative information, that is, information on where a node or event is not located, to refine location estimates. Sextant unifies both node and event detection within the same general framework. It can provide high precision without dedicated localization hardware by aggressively extracting constraints from the link layer, representing areas precisely with Bézier-enclosed polygons and probability distributions, and using event detection to refine node position estimates. A compact representation and a fully distributed implementation make the framework practical for resource-limited devices. The framework has been implemented, deployed and tested on laptops, PDAs and Mica-2 motes. Physical experiments show that a large number (98%) of the nodes in a network can determine their positions based on a small number (30%) of landmark nodes and that a large number (90%) of events can be located with low median error.
Traditional operating system interfaces and network protocol implementations force system state to be kept on both sides of a connection. Such state ties the connection to an endpoint, impedes transparent failover, permits denial-of-service attacks, and limits scalability. This paper introduces a novel TCP-like transport protocol and a new interface to replace sockets that together enable all state to be kept on one endpoint, allowing the other endpoint, typically the server, to operate without any per-connection state. Called Trickles, this approach enables servers to scale well with increasing numbers of clients, consume fewer resources, and better resist denial-of-service attacks. Measurements on a full implementation in Linux indicate that Trickles achieves performance comparable to TCP/IP, interacts well with other flows, and scales well. Trickles also enables qualitatively different kinds of networked services. Services can be geographically replicated and contacted through an anycast primitive for improved availability and performance. Widely-deployed practices that currently have client-observable side effects, such as periodic server reboots, connection redirection, and failover, can be made transparent, and perform well, under Trickles. The protocol is secure against tampering and replay attacks, and the client interface is backwards-compatible, requiring no changes to sockets-based client applications.
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multi-resolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growth-constrained metric, a low-dimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million node-pairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
This paper describes Credence, a distributed object reputation management scheme for combating content pollution in peer-to-peer filesharing systems. Credence enables honest peers to assess the authenticity of online content by securely tabulating and managing endorsements from other peers. Credence employs a novel voter correlation scheme to weight peer opinions, which gives rise to favorable incentives and system dynamics. We present simulation results indicating that our system is scalable, efficient, and robust.
We present a simple rate matching-based mechanism for voltage adaptation in a microprocessor running a multiprogrammed workload. The mechanism incorporates a set of architecture and operating system extensions through which applications can communicate their actual and desired progress to the operating system. Using this feedback, the operating system uses a modified scheduling algorithm to run all applications at a single, globallyoptimal voltage. We demonstrate that significant energy savings are possible with a simple, practical set of extensions to the architecture and operating system.
Anonymity is increasingly important for networked applications amidst concerns over censorship and privacy. This paper outlines the design of HerbivoreFS, a scalable and efficient file sharing system that provides strong anonymity. HerbivoreFS provides computational guarantees that even adversaries able to monitor all network traffic cannot deduce the identity of a sender or receiver beyond an anonymizing clique of k peers. HerbivoreFS achieves scalability by partitioning the global network into smaller anonymizing cliques. Measurements on PlanetLab indicate that the system achieves high anonymous bandwidth when deployed on the Internet.
Name services are critical for mapping logical resource names to physical resources in large-scale distributed systems. The Domain Name System (DNS) used on the Internet, however, is slow, vulnerable to denial of service attacks, and does not support fast updates. These problems stem fundamentally from the structure of the legacy DNS. This paper describes the design and implementation of the Cooperative Domain Name System (CoDoNS), a novel name service, which provides high lookup performance through proactive caching, resilience to denial of service attacks through automatic load-balancing, and fast propagation of updates. CoDoNS derives its scalability, decentralization, self-organization, and failure resilience from peer-to-peer overlays, while it achieves high performance using the Beehive replication framework. Cryptographic delegation, instead of host-based physical delegation, limits potential malfeasance by namespace operators and creates a competitive market for namespace management. Backwards compatibility with existing protocols and wire formats enables CoDoNS to serve as a backup for legacy DNS, as well as a complete replacement. Performance measurements from a real-life deployment of the system in PlanetLab shows that CoDoNS provides fast lookups, automatically reconfigures around faults without manual involvement and thwarts distributed denial of service attacks by promptly redistributing load across nodes.
A secure, tamperproof execution environment is critical for trustworthy network computing. Newly emerging hardware, such as those developed as part of the TCPA and Palladium initiatives, enables operating systems to implement such an environment through Merkle hash trees. We examine the selection of optimal parameters, namely blocksize and tree depth, forMerkle hash trees based on the size of the memory region to be protected and the number of memory updates between updates of the hash tree. We analytically derive an expression for the cost of updating the hash tree, show that there is an optimal blocksize for the leaves of a Merkle tree for a given filesize and update interval that minimizes the cost of update operations, and describe a general method by which the parameters of such a tree can be determined optimally.
This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to restructure discrete event simulators to operate in stages that precompute, cache, and reuse partial results to drastically reduce redundant computation within and across simulations. We present a general and flexible framework for staging and identify the advantages and trade-offs of its application to wireless network simulations, a particularly challenging simulation domain. Experience with applying staged simulation to the ns2 simulator shows that staging can improve execution time by an order of magnitude or more and enable the simulation of wireless networks with tens of thousands of nodes.
Structured peer-to-peer hash tables provide decentralization, self-organization, failure-resilience, and good worst-case lookup performance for applications, but suffer from high latencies (O(logN)) in the average case. Such high latencies prohibit them from being used in many relevant, demanding applications such as DNS. In this paper, we present a proactive replication framework that can provide constant lookup performance for common Zipf-like query distributions. This framework is based around a closed-form optimal solution that achieves O(1) lookup performance with low storage requirements, bandwidth overhead and network load. Simulations show that this replication framework can realistically achieve good latencies, outperform passive caching, and adapt efficiently to sudden changes in object popularity, also known as flash crowds. This framework provides a feasible substrate for high-performance, low-latency applications, such as peer-to-peer domain name service.
CorSSO is a distributed service for authentication in networks. It allows application servers to delegate client identity checking to combinations of authentication servers potentially residing in separate administrative domains. In CorSSO, authentication policies enable the system to tolerate expected classes of attacks and failures. A novel partitioning of the work associated with authentication of principals means that the system scales well with increases in the numbers of users and services.
High lookup latencies prohibit peer-to-peer overlays from being used in many performance intensive applications, even though they provide self-organization, scalability, and failure resilience. In this paper, we show that lookup performance of structured DHTs can be improved to any desired constant, even under a single hop, by controlled proactive replication. By exploiting the popularity distribution of objects, we can minimize the number of replicas and reduce the storage and bandwidth cost of replication. This enables structured DHTs to efficiently support a wide variety of latency sensitive applications. We describe three different applications, namely DNS, web access, and content distribution, and show how they can derive significant performance gains by using DHTs.
This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical wireless network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to reduce the amount of redundant computation within a simulation by restructuring discrete event simulators to operate in stages that precompute, cache, and reuse partial results. This paper presents a general and flexible framework for staging, and identifies the advantages and trade-offs of its application to wireless network simulations. Experience with applying staged simulation to the ns2 simulator shows that it can improve execution time by an order of magnitude in typical scenarios and make feasible the simulation of large scale wireless networks.
A central challenge in ad hoc networks is the design of routing protocols that can adapt their behavior to frequent and rapid changes in the network. The performance of proactive and reactive routing protocols varies with network characteristics, and one protocol may outperform the other in different network conditions. The optimal routing strategy depends on the underlying network topology, rate of change, and traffic pattern, and varies dynamically. This paper introduces the Sharp Hybrid Adaptive Routing Protocol (SHARP), which automatically finds the balance point between proactive and reactive routing by adjusting the degree to which route information is propagated proactively versus the degree to which it needs to be discovered reactively. SHARP enables each node to use a different application-specific performance metric to control the adaptation of the routing layer. This paper describes application-specific protocols built on top of SHARP for minimizing packet overhead, bounding loss rate, and controlling jitter. Simulation studies show that the resulting protocols outperform the purely proactive and purely reactive protocols across a wide range of network characteristics.
Peer-to-peer systems are typically designed around the assumption that all peers will willingly contribute resources to a global pool. They thus suffer from freeloaders, that is, participants who consume many more resources than they contribute. In this paper, we propose a general economic framework for avoiding freeloaders in peer-to-peer systems. Our system works by keeping track of the resource consumption and resource contribution of each participant. The overall standing of each participant in the system is represented by a single scalar value, called their karma. A set of nodes, called a bankset, keeps track of each nodeś karma, increasing it as resources are contributed, and decreasing it as they are consumed. Our framework is resistant to malicious attempts by the resource provider, consumer, and a fraction of the members of the bank set. We illustrate the application of this framework to a peer-to-peer filesharing application.
In this paper, we describe three novel analyses for eliminating unnecessary synchronization that remove over 70% of dynamic synchronization operations on the majority of our 15 benchmarks and improve the bottom-line performance of three by 37-53%. Our analyses attack three frequent forms of unnecessary synchronization: thread-local synchronization, reentrant synchronization, and enclosed lock synchronization. We motivate the design of our analyses with a study of the kinds of unnecessary synchronization found in a suite of single- and multithreaded benchmarks of different sizes and drawn from a variety of domains. We analyze the performance of our optimizations in terms of dynamic operations removed and run-time speedup. We also show that our analyses may enable the use of simpler synchronization models than the model found in Java, at little or no additional cost in execution time. The synchronization optimizations we describe enable programmers to design efficient, reusable and maintainable libraries and systems in Java without cumbersome manual code restructuring.
Anonymity is increasingly important for networked applications amidst concerns over censorship and privacy. In this paper, we describe Herbivore, a peer-to-peer, scalable, tamper-resilient communication system that provides provable anonymity and privacy. Building on dining cryptographer networks, Herbivore scales by partitioning the network into anonymizing cliques. Adversaries able to monitor all network traffic cannot deduce the identity of a sender or receiver beyond an anonymizing clique. In addition to strong anonymity, Herbivore simultaneously provides high efficiency and scalability, distinguishing it from other anonymous communication protocols. Performance measurements from a prototype implementation show that the system can achieve high bandwidths and low latencies when deployed over the Internet.
Topological changes in mobile ad hoc networks frequently render routing paths unusable. Such recurrent path failures have detrimental effects on the network ability to support QoS-driven services. A promising technique for addressing this problem is to use multiple redundant paths between the source and the destination. However, while multipath routing algorithms can tolerate network failures well, their failure resilience only holds if the paths are selected judiciously. In particular, the correlation between the failures of the paths in a redundant path set should be as small as possible. However, selecting an optimal path set is an NPcomplete problem. Heuristic solutions proposed in the literature are either too complex to be performed in real-time, or too ineffective, or both. This paper proposes a multipath routing algorithm, called Disjoint Pathset Selection Protocol (DPSP), based on a novel heuristic that, in nearly linear time on average, picks a set of highly reliable paths. The convergence to a highly reliable path set is very fast, and the protocol provides flexibility in path selection and routing algorithm. Furthermore, DPSP is suitable for real-time execution, with nearly no message exchange overhead and with minimal additional storage requirements. This paper presents evidence that multipath routing can mask a substantial number of failures in the network compared to single path routing protocols, and that the selection of paths according to DPSP can be beneficial for mobile ad hoc networks, since it dramatically reduces the rate of route discoveries.
This paper presents an approach for formally specifying and enforcing security policies on web service implementations. Networked services in general, and web services in particular, require extensive amounts of code to ensure that clients respect siteintegrity constraints. We provide a language by which these constraints can be expressed and enforced automatically, portably and efficiently. Security policies in our system are specified in a language based on temporal logic, and are processed by an enforcement engine to yield site and platform-specific access control code. This code is integrated with a web server and platform specific libraries to enforce the specified policy on a given web service. Our approach decouples the security policy specification from service implementations, provides a mandatory access control model for web services, and achieves good performance. We show that up to 22% of the code in a traditional web service module is dedicated to security checking functionality, including checks for client sequencing and parameter validation. We show that our prototype language implementation, WebGuard, enables web programmers to significantly reduce the amount of security checking code they need to develop manually. The quality of the code generated by WebGuard from formal policy specifications is competitive with the latency of handcrafted code to within a few percent.
Ad hoc and sensor networks are an important, emerging niche that is poorly supported by existing operating systems. In this paper, we argue that network-wide energy management is a primary concern in ad hoc networks, and that this functionality is best provided by a systems layer. We are currently designing and implementing a distributed, power-aware, adaptive operating system, called MagnetOS, specifically targeting ad hoc and sensor networks. MagnetOS provides a single system image of a unified Java virtual machine across the nodes that comprise an ad hoc network. By automatically and transparently partitioning applications into components and dynamically placing these components on nodes within the ad hoc network, our system reduces energy consumption, avoids hotspots and increases system longevity. We show that a systems approach to automatic object placement in an ad hoc network can increase system longevity by a factor of four to five.
In this paper, we describe PortOS, an educational operating system designed to complement undergraduate and graduate level classes on operating systems. PortOS is a complete user-level operating system project, with phases covering concurrency, synchronization, networking and file systems. It focuses particularly on ad-hoc and peer-to-peer distributed computing on mobile devices. This paper discusses alternative approaches to operating system projects, and presents our particular design point along with pedagogical justifications.
Anonymity is critical for many networked applications. Yet current Internet protocols provide no support for masking the identity of communication endpoints. This paper outlines a design for a peer-to-peer, scalable, tamper-resilient communication protocol that provides strong anonymity and privacy. Called CliqueNet, our protocol provides an information-theoretic guarantee: an omnipotent adversary that can wiretap at any location in the network cannot determine the sender of a packet beyond a clique, that is, a set of k hosts, where k is an anonymizing factor chosen by the participants. CliqueNet is resilient to jamming by malicious hosts and can scale with the number of participants. This paper motivates the need for an anonymous communication layer and describes the self-organizing, novel divide-and-conquer approach that enables CliqueNet to scale while offering a strong anonymity guarantee. CliqueNet is widely applicable as a communication substrate for peer-to-peer applications that require anonymity, privacy and anti-censorship guarantees.
Developing applications for ad-hoc and sensor networks poses significant challenges. Many interesting applications in these domains entail collaboration between components distributed throughout an ad-hoc network. Defining these components, optimally placing them on nodes in the ad-hoc network and relocating them in response to changes is a fundamental problem faced by such applications. Manual approaches to code and data migration are not only platform-dependent and error-prone, but also needlessly complicate application development. Further, locally optimal decisions made by applications that share the same network can lead to globally unstable and energy inefficient behavior.In this paper we describe the design and implementation of a distributed operating system for adhoc and sensor networks whose goal is to enable power-aware, adaptive, and easy-to-develop adhoc networking applications. Our system achieves this goal by providing a single system image of a unified Java virtual machine to applications over an ad-hoc collection of heterogeneous nodes. It automatically and transparently partitions applications into components and dynamically finds a placement of these components on nodes within the ad-hoc network to reduce energy consumption and increase system longevity. This paper outlines the design of our system and evaluates two practical, power-aware, online algorithms for object placement that form the core of our system. We demonstrate that our algorithms can increase system longevity by a factor of four to five by effectively distributing energy consumption, and are suitable for use in an energy efficient operating system in which applications are distributed automatically and transparently.
This paper describes the motivation, architecture and performance of a distributed virtual machine (DVM) for networked computers. DVMs rely on a distributed service architecture to meet the manageability, security and uniformity requirements of large, heterogeneous clusters of networked computers. In a DVM, system services, such as verification, security enforcement, compilation and optimization, are factored out of clients and located on powerful network servers. This partitioning of system functionality reduces resource requirements on network clients, improves site security through physical isolation and increases the manageability of a large and heterogeneous network without sacrificing performance. Our DVM implements the Java virtual machine, runs on x86 and DEC Alpha processors and supports existing Java-enabled clients.
The Java Virtual Machine (JVM) has emerged as a ubiquitous platform for network computing. JVMs have been incorporated into browsers, web servers, database engines, and personal digital assistants, and there are plans under way to use them on embedded devices and smartcards. The adoption of JVMs into such diverse and security critical domains requires that their safety and security be tested rigorously. Unfortunately, Java virtual machines are large, complex, and have many subtleties, and consequently testing them is a difficult task. Commercial virtual machines deployed so far have had to rely on manual and ad hoc techniques for testing, and have subsequently exhibited substantial weaknesses that could lead to information theft or destruction. In this paper, we describe our experience with automatically testing Java virtual machines. We outline two effective automatic testing techniques for JVMs. The first technique is comparative evaluation with mutations, where randomly perturbed test inputs are used to identify discrepancies between different versions of JVMs. We show that this fast and effective technique achieves broad code coverage, and discuss its shortcomings. Second, we present a well-structured technique for generating complex test cases from cogent grammar descriptions. We describe lava, a special purpose language we developed to specify production grammars for testing, and show that grammar-based test generation can produce very complex test cases from compact specifications. The testing process is easy to steer, and can generate targeted test cases. Most importantly, grammars enable the tester to reason about the expected system behavior on the test inputs. We show how grammars can be used in conjunction with inductive proofs to solve the oracle problem, and describe various application areas for grammar based testing. Finally, we report the results of applying these techniques to commercial Java virtual machine implementations.
Extensible typesafe systems, such as Java, rely critically on a large and complex software base for their overall protection and integrity, and are therefore difficult to test and verify. Traditional testing techniques, such as manual test generation and formal verification, are too time consuming, expensive, and imprecise, or work only on abstract models of the implementation and are too simplistic. Consequently, commercial virtual machines deployed so far have exhibited numerous bugs and security holes.In this paper, we discuss our experience with using production grammars in testing large, complex and safety-critical software systems. Specifically, we describe lava, a domain specific language we have developed for specifying production grammars, and relate our experience with using lava to generate effective test suites for the Java virtual machine. We demonstrate the effectiveness of production grammars in generating complex test cases that can, when combined with comparative and variant testing techniques, achieve high code and value coverage. We also describe an extension to production grammars that enables concurrent generation of certificates for test cases. A certificate is a behavioral description that specifies the intended outcome of the generated test case, and therefore acts as an oracle by which the correctness of the tested system can be evaluated in isolation. We report the results of applying these testing techniques to commercial Java implementations. We conclude that the use of production grammars in combination with other automated testing techniques is a powerful and effective method for testing software systems, and is enabled by a special purpose language for specifying extended production grammars.
This paper presents and evaluates a set of analyses designed to reduce synchronization overhead in Java programs. Monitor-based synchronization in Java often causes significant overhead, accounting for 5-10% of total execution time in our benchmark applications. To reduce this overhead, programmers often try to eliminate unnecessary lock operations by hand. Such manual optimizations are tedious, error-prone, and often result in poorly structured and less reusable programs. Our approach replaces manual optimizations with static analyses that automatically find and remove unnecessary synchronization from Java programs. These analyses optimize cases where a monitor is entered multiple times by a single thread, where one monitor is nested within another, and where a monitor is accessible by only one thread. A partial implementation of our analyses eliminates up to 70% of synchronization overhead and improves running time by up to 5% for several already hand-optimized benchmarks. Thus, our automated analyses have the potential to significantly improve the performance of Java applications while enabling programmers to design simpler and more reusable multithreaded code.
In emerging application domains, such as thin client computing and hypertext systems with embedded objects (e.g. the World Wide Web), the process of downloading application code is in the critical path of users. We observe that in these domains, compliance with existing standards and minimizing the impact on the clients is crucial. We argue that the fundamental problem for mobile code is that the units of code distribution in networked object systems, such as Java, are not suited for efficient utilization of network bottlenecks. In this paper, we propose a separate optimization step, between compilation and loading, whereby application code is restructured to more effectively use the available network bandwidth for program download. We have designed and implemented such an optimization step as a binary rewriting service for Java applets and applications. Our implementation does not require any modifications to existing Java virtual machines, compilers or clients. We have found that restructuring of Java applications can improve program startup times by up to 30%.
Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While today’s virtual machines provide higher-level abstractions and more sophisticated services than their predecessors, and while they have migrated from dedicated mainframes to heterogeneous networked computers, their architecture has essentially remained intact. State of the art virtual machines are still monolithic, that is, all system components reside on the same host and are replicated among all clients in an organization. This crude replication of services among clients creates problems of security, manageability, performance and scalability.We propose a distributed architecture for virtual machines based on distributed service components. In our proposed system, services that control security, resource management, and code optimization are factored out of clients and reside in enterprisewide network servers. The services produce self-certifying, self-regulating, selfoptimizing programs via binary rewriting. We are currently building a Java virtual machine based on this architecture. We argue that distributed virtual machine architectures enable higher integrity, manageability, performance and scalability than monolithic virtual machines where all components reside on all clients.
Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While these virtual machines provide higher-level abstractions and more sophisticated services than their predecessors from twenty years ago, their architecture has essentially remained unchanged. State of the art virtual machines are still monolithic, that is, they are comprised of closely-coupled service components, which are thus replicated over all computers in an organization. This crude replication of services forms one of the weakest points in today’s networked systems, as it creates widely acknowledged and well-publicized problems of security, manageability and performance.We have designed and implemented a new system architecture for network computing based on distributed virtual machines. In our system, virtual machine services that perform rule checking and code transformation are factored out of clients and are located in enterprise-wide network servers. The services operate by intercepting application code and modifying it on the fly to provide additional service functionality. This architecture reduces client resource demands and the size of the trusted computing base, establishes physical isolation between virtual machine services and creates a single point of administration. We demonstrate that such a distributed virtual machine architecture can provide substantially better integrity and manageability than a monolithic architecture, scales well with increasing numbers of clients, and does not entail high overhead.
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system’s interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export fine-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach offers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations.
Application domains such asmultimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorly matched by the services provided. The SPIN operating system enables system services to be defined in an application-specific fashion through an extensible microkernel. It offers applications fine-grained control over a machineś logical and physical resources through run-time adaptation of the system to application requirements.
Application domains such asmultimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorly matched by the services provided. The SPIN operating system enables system services to be defined in an application-specific fashion through an extensible microkernel. It offers applications fine-grained control over a machineś logical and physical resources through run-time adaptation of the system to application requirements.
Application domains, such as multimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed interfaces and implementations to system services and resources. This makes them inappropriate for applications whose resource demands and usage patterns are poorlymatched by the services provided. The SPIN operating systemenables system services to be defined in an application-specific fashion through an extensible microkernel. It offers fine-grained control over a machineś logical and physical resources to applications through run-time adaptation of the system to application requirements.
A list of all my talks can be found here.
| A fast, consistent, fault-tolerant key-value store | |
| An open Paxos replicated state machine implementation | |
| Operating system for trustworthy computing | |
| A decentralized search plug-in for Azureus | |
| A new bibliographic management tool to replace BibTeX | |
| An optimal failure detector implementation | |
| An open service that maps a client to the nearest server | |
| An open-access content distribution network, similar to Akamai | |
| LimeWire client for filesharing with reputation management | |
| Lightweight, scalable network positioning | |
| Replacement and safety net for DNS | |
| High performance, scalable wireless network simulator | |
| An instructional OS platform for Windows | |
| Java verification and disassembly | |
| Extensible operating system | |
| MIPS simulator | |
| Dive computer |
| Co-Chair, Usenix Annual Technical Conference, 2013. | |
| Dependable Systems and Networks, 2013. | |
| Co-Chair, Workshop on Hot Topics in Networks, 2012. | |
| Symposium on Operating System Design and Implementation, 2012. | |
| ACM SIGCOMM Conference, 2012. | |
| ACM Symposium on Networked System Design and Implementation, 2012. | |
| European Conference on Computer Systems, 2012. |
|
|
I enjoy sailing (minitransats, I14s, sailboards), backcountry skiing and photography.
For visitors: Directions to the Cornell CS Department.