The Process Group Approach to Reliable Distributed Computing
Kenneth P. Birman, CACM, 36(12): 37-53, Dec 1993
Notes by Indranil Gupta, March 13, 1999.
Goal of the article: Review 10 years of research on the Isis System -
the motivation, the approach(es) taken, and experiences with real-life applications.
The Motivation: Development of reliable distributed software can be
simplified using process groups and group programming tools.
The need for Process Group Services
- Several applications inherently use process groups
ex. brokerage and trading systems: reliability in publishing and subscribing to
messages => replicating the data => process groups.
- Programmer does not have to write the code for group communication systems from scratch
Impossibility Results (not in this paper)
- Group membership problem cannot be solved for asynchronous systems of the primary
partition type (i.e., that seek to maintain a single view of the group membership across
the group), even if process killing is allowed [1]. The Isis specification falls in this
category.
- Impossiblity not proved (still ?) for system specifications that allow multiple views
across a group. However, such group membership specifications have to be strong enough to
rule out useless group membership protocols, yet weak enough to be solvable [1].
Types of Process Groups
- Anonymous groups: should provide facilities for sending messages to a group address; all
or none, exactly once delivery of messages; message delivery order; history (of messages,
events) consistently reflecting in current state (across all processes in group).
- Explicit groups: Members cooperate directly; employ algorithms that use lists of
members, relative rankings in list etc. Additional needs: membership change needs to be
published to group. Membership seen by all group members needs to be consistent.
Problems to be tackled
- Weak support for reliable communication (ex. if channels break)
- Group address expansion
- Delivery ordering for concurrent messages
- Delivery ordering for related messages
- Synchronization
- Failure atomicity: (Failure model: fail-stop model - no byzantine failures)
Isis' solutions to the above problems
- The ideal solution: Close Synchrony. Solves all above problems in a simple, but
expensive fashion. Related to Lamport & Schneider's state machine approach which
specifies synchronous lockstep execution of processes. Solutions offered to above
problems:
- Weak communication reliability guarantee: each multicast is a 'single event' in the
system.
- Group address expansion: at the instant of multicast delivery.
- Delivery ordering for concurrent messages: total ordering.
- Delivery ordering for related messages: ensured by (3).
- State transfers to solve synchronization.
- Failure atomicity: ensured by multicast being a 'single event'.
- Isis' integrated solution to above problems: Virtual Synchrony.
- Permits asynchronous executions for which there exists some closely synchronous
execution that is indistinguishable. Events are synchronized only to the degree the
application is sensitive to event ordering. Programmer can develop code assuming a closely
synchronous model. Treats communication, process group membership changes, failures
through a single event-oriented execution model.
- Order sensitivity of mesages
- abcast: atomic delivery ordering. Easy but expensive to implement. High latencies.
- cbcast: causal ordering (from Lamport). Lower latencies.
- abcast can be built on cbcast.
- Other Isis techniques
- Maintaining group membership at each process.
- State transfer.
- Asynchronous pipelined communication.
- Supports meaningful notion of group state and state transfer for both data replication
and computation being dynamically partitioned among group members.
- Failure handling through consistently presented group membership list integrated into
the communication system (compare with usual approach of sensing failures through
timeouts).
- Three round multicast protocol - to solve problem of sender failing after multicast
delivered to some receivers only.
- Other Isis tools
- Replication tool for managing replicated data.
- Tool for fault-tolerant primary-backup server design.
- Synchronization tool to support a form of locking.
- Checkpoint/update logging, spooling for state recovery from failure.
- Token passing for synchronization.
- Monitoring sites for failure.
- First member of a group intialized by Isis software, later members use join+state
transfer.
- Support for triggers - applications are notified when trigger becomes true.
- Typical group styles for which Isis is optimized
- Peer group
- Client-server group
- Diffusion group
- Hierarchical group
- Who uses Isis and how
- Brokerage systems (the most famous application of Isis) - Isis solves fault-tolerance at
the file/database level and provides tools for fault-tolerance at lower local file level.
- Subscribe-publish facility in stock exchanges.
- Database replication and triggers.
- Others: NEWS, NMGR, DECEIT, META/LOMITA SPOOLER, AEGIS.
- Disadvantages, limitations, issues not considered in Isis
- Reduced availability during LAN partitions : allows progress in single partitions only
(primary partition). Thus tolerates <= [n/2] - 1 simultaneous failures.
- How does Isis compare vis-a-vis the transactional serializability model.
- Real-time issues not considered.
- Persistence of databases and files not considered.
Questions/Topics for discussion
- How should messages be ordered across group boundaries ?
- How is the application programmer given control over the propagation of causality
information ?
- Should a group communication system like Isis be installed as a user-level library or in
the OS ? (hint: better performance)
- Further, how does the system manage to handle some groups using cbcast and some using
abcast ?
- What about groups that manage the group membership (of other groups) ?
Further Readings
- On the impossibility of Group Membership, Tushar Chandra, Vassos Hadzilacos, Sam Toueg,
Bernadette Charron-Bost, ncstrl.cornell/TR95-1548.
- Design Alternatives for Process Group Membership and Multicast, Kenneth P. Birman,
Robert Cooper, Barry Gleeson, ncstrl.cornell/TR91-1257.
Here are the scribed notes I wrote while reading this
paper.