Even when hidden, message passing lies at the heart of any distributed system. A tremendous number of message passing interfaces and protocols have been developed both by the practical and theoretical computer science community. Efforts to bring structure to all this development have been only partially successful. Today, this lack of structure impedes the engineering of large, complex distributed systems. For example, a variety of both fault-tolerance and multi-media protocols are readily available. Yet it would be tremendously complex to implement a large fault-tolerant multi-media system. The integration of subsystems that provide different protocols into a working whole requires intimate knowledge of the internals of each subsystem, and considerable creativity to make them interplay.
Here, we adopt a perspective that treats a protocol as an abstract data type: a software module with standardized top and bottom interfaces. Above a protocol module are other protocols or applications that issue requests to it. The protocol itself functions by adding headers to messages, or generating new messages of its own, whereby it interacts with the corresponding module on a remote system. The lower interface permits the module to receive incoming messages, together with other sorts of events.
In most systems, this modular structure is obscured. Each subsystem may have its own top and bottom-level interfaces, its own message data structure, and its own methods of scheduling internal and external events. Interconnecting the different interfaces, converting between the different message formats, and running the different schedulers concurrently arise as challenges that the application developer must resolve. Network standardization has focused mainly on the message formats, permitting processes running on different systems to communicate. The seemingly simpler problem of composing subsystems on the same operating system has received much less attention.
The need is for a single system that has one message format, one event scheduler, and a framework allowing protocol composition. Composition requires that the top-level and bottom-level interfaces of the protocols be identical for each layer, so they can be stacked on top of each other like LEGO.9extm blocks (see Figure 1). The protocol interface must be sufficiently strong to support most protocols, containing hooks with which the interface can be extended to add new features. Luckily, work on object-oriented systems has addressed exactly these requirements. If we can specify protocols in terms of objects, then we can use existing object-oriented techniques for composition of these protocols.
The Horus system provides such an object-oriented protocol composition framework. The system supports objects for communication endpoints, groups of communicating endpoints, and messages. It currently includes a library of about thirty different protocols, each providing a particular communication feature. Protocols can be composed in many ways, allowing flexibility and having the additional advantage that an application pays only for properties it uses. Horus can support many applications concurrently, each of which can be configured individually. Horus supports non-Horus subsystems by providing a separate scheduling environment for each subsystem, and a system-call interception technique that traps system calls made by the subsystem. This gives Horus complete control over each subsystem and an inexpensive way to communicate with it.
Horus arose out of our prior work on fault-tolerant process group computing in Isis system [4]. Isis supports a virtually synchronous process group communication environment in which software fault-tolerance was applied to a variety of problems. Isis supported process groups with mechanisms for joining a group and obtaining its state, leaving a group (a failed process is automatically dropped from the groups to which it belonged), and communicating with groups using atomic, ordered multicasts. These primitive functions were used to support tools for locking and replicating data, load-balancing, guaranteed execution, primary-backup fault-tolerance, parallel computation, and system control and management. Horus focuses on the core of Isis, implementing a very powerful process group communication architecture which can be used in support of Isis-like tools, embedded into programming languages or parallel computing libraries, or hidden behind standard abstractions such as UNIX sockets.
In this paper we discuss Horus in relatively practical terms, omitting theory that has been explored elsewhere and pursuing new theoretical directions suggested by Horus only in a limited, preliminary fashion. The paper describes a very simple protocol architecture that is still powerful enough to support the most important styles of distributed protocols used in modern systems. It should be stressed that this layered architecture does not imply a high overhead; indeed, the cost of a layer can be as low as just a few instructions at runtime, and a few bytes (or none at all) added to a message. Our experience with Horus bears this out: with reasonable effort one can achieve performance fully comparable to the best existing systems for similar environments [15].
Although much remains to be done, we have also started to develop formal tools for specification of protocol layers, as well building reference implementations of the most critical Horus layers. In this use, a specification is a skeletal description of the behavior of a layer, giving the requirements the layer makes on layers above and below it, and the guarantees the layer provides in situations where the requirements hold. A reference implementation is an formalized version of a layer, potentially executable, but developed primarily to facilitate the use of formal proof tools for verification. Our specification and reference language is a subset of ML, while the language of preference for highly optimized versions of layers is C or C++. Demanding applications would normally use the more optimized layers, which sometimes combine the functions of several reference layers into a single high performance production version. Our contention is that by providing high level, executable, descriptions of key parts of Horus, the system can be significantly hardened. This approach is discussed in detail in Section 8.
Horus is thus a multi-faceted effort. The project seeks to contribute a powerful and flexible programming environment for distributed application development, focusing on issues of fault-tolerance, consistency and security using process groups and (if desired) virtual synchrony. We do this in a principled and modular manner that facilitates the use of our system to implement protocols with goals different from our own. Moreover, Horus creates a framework within which formal methods can be brought to bear on such problems as protocol specification and verification.