Previous Contents Next

4   Event protocol: Intra-stack communication

Ensemble embodies two forms of communication. The first is communication between protocol stacks in a group, using messages sent via some communication transport. The second is intra-stack communication between protocol layers sharing a protocol stack (see fig ??), using Ensemble events (see page ?? for a overview of Ensembleevents). One use of events is for passing information directly related to messages (i.e., broadcast messages are usually associated with ECast events). However, events also are used for notifying layers of group membership changes, telling other layers about suspected failed members, synchronizing protocol layers for view changes, passing acknowledgment and stability information, alarm requests and timeouts, etc.... In order for a set of protocol layers to harmoniously implement a higher level protocol, they have to agree on what these various events ``mean,'' and in general follow what is called here the Ensemble event protocol.

The layering in Ensemble is advantageous because it allows complex protocols to be decomposed into a set of smaller, more understandable protocols. However, layering also introduces the event protocol which complicates the system through the addition of intra-stack communication (the event protocol) to inter-stack communication (normal message communication).

Be aware that this information may become out of date. Although the ``spirit'' of the information presented here is unlikely to change in drastic ways, always consider the possibility that this information does not exactly match that in type/event.ml and type/event.mli. Please alert us when such inconsistencies are discovered so they may be corrected.



Figure 1: Events are used for intra-stack communication: layers can only communicate with other layers by modifying events; layers never read or modify other layer's message headers. Messages are used for inter-stack communication: only messages are sent between group members; events are never sent between members.


The documentation of the event protocol proceeds as follows.

4.1   Event Types




    (* These events should have messages associated with them. *)
  | ECast				(* Multicast message *)
  | ESend				(* Pt2pt message *)
  | ESubCast				(* Multi-destination message *)
  | ECastUnrel				(* Unreliable multicast message *)
  | ESendUnrel				(* Unreliable pt2pt message *)
  | EMergeRequest			(* Request a merge *)
  | EMergeGranted			(* Grant a merge request *)
  | EOrphan				(* Message was orphaned *)

    (* These types do not have messages. *)
  | EAccount				(* Output accounting information *)
(*| EAck			      *)(* Acknowledge message *)
  | EAsync				(* Asynchronous application event *)
  | EBlock				(* Block the group *)
  | EBlockOk				(* Acknowledge blocking of group *)
  | EDump				(* Dump your state (debugging) *)
  | EElect				(* I am now the coordinator *)
  | EExit				(* Disable this stack *)
  | EFail				(* Fail some members *)
  | EGossipExt				(* Gossip message *)
  | EGossipExtDir			(* Gossip message directed at particular address *)
  | EInit				(* First event delivered *)
  | ELeave				(* A member wants to leave *)
  | ELostMessage			(* Member doesn't have a message *)
  | EMergeDenied			(* Deny a merge request *)
  | EMergeFailed			(* Merge request failed *)
  | EMigrate				(* Change my location *)
  | EPresent                            (* Members present in this view *)
  | EPrompt				(* Prompt a new view *)
  | EProtocol				(* Request a protocol switch *)
  | ERekey				(* Request a rekeying of the group *)
  | ERekeyPrcl				(* The rekey protocol events *)
  | ERekeyPrcl2				(*                           *)
  | EStable				(* Deliver stability down *)
  | EStableReq				(* Request for stability information *)
  | ESuspect				(* Member is suspected to be faulty *)
  | ESystemError			(* Something serious has happened *)
  | ETimer				(* Request a timer *)
  | EView				(* Notify that a new view is ready *)
  | EXferDone				(* Notify that a state transfer is complete *)
  | ESyncInfo
      (* Ohad, additions *)
  | ESecureMsg				(* Private Secure messaging *)
  | EChannelList			(* passing a list of secure-channels *)
  | EFlowBlock				(* Blocking/unblocking the application for flow control*)
(* Signature/Verification with PGP *)
  | EAuth

  | ESecChannelList                     (* The channel list held by the SECCHAN layer *)
  | ERekeyCleanup
  | ERekeyCommit 

Figure 2: Event typ type definition. Taken from type/event.mli.


This section describes the different types of events. See fig ?? for the source code of enumerated types. The behavior of a layer depends not only on the event type and its fields, but also on the direction from which it arrives. For example, an ESend event travels in the sender stack from the application down, and at the receiver from the bottom, up to the application. The sender and receiver layers behave quite differently depending on whether the message is sent or received. In what follows, we sometimes specifically include the event direction. Detailed Descriptions:

4.2   Event fields

Here we describe all the fields that appear in the events. The type definitions appear in fig ?? and fig ??. Default values for the fields appear in fig ??.



type field =
      (* Common fields *)
  | Type        of typ            (* type of the message*)
  | Peer        of rank           (* rank of sender/destination *)
  | Iov	        of Iovecl.t       (* payload of message *)
  | ApplMsg                       (* was this message generated by an appl? *)

      (* Uncommon fields *)
  | Address     of Addr.set	  (* new address for a member *)
  | Failures    of bool Arrayf.t  (* failed members *)
  | Presence    of bool Arrayf.t  (* members present in the current view *)
  | Suspects    of bool Arrayf.t  (* suspected members *)
  | SuspectReason of string	  (* reasons for suspicion *)
  | Stability   of seqno Arrayf.t (* stability vector *)
  | NumCasts    of seqno Arrayf.t (* number of casts seen *)
  | Contact     of Endpt.full * View.id option (* contact for a merge *)

      (* HEAL gossip *)  
  | HealGos     of Proto.id * View.id * Endpt.full * View.t * Hsys.inet list
  | SwitchGos   of Proto.id * View.id * Time.t  (* SWITCH gossip *)
  | ExchangeGos	of string		(* EXCHANGE gossip *)

      (* INTER gossip *)
  | MergeGos    of (Endpt.full * View.id option) * seqno * typ * View.state
  | ViewState   of View.state	(* state of next view *)
  | ProtoId     of Proto.id	(* protocol id (only for down events) *)
  | Time        of Time.t	(* current time *)
  | Alarm       of Time.t	(* for alarm requests *)
  | ApplCasts   of seqno Arrayf.t
  | ApplSends   of seqno Arrayf.t
  | DbgName     of string

      (* Flags *)
  | NoTotal                     (* message is not totally ordered*)
  | ServerOnly	                (* deliver only at servers *)
  | ClientOnly	                (* deliver only at clients *)
  | NoVsync
  | ForceVsync
  | Fragment	                (* Iovec has been fragmented *)

      (* Debugging *)
  | History     of string       (* debugging history *)

      (* Private Secure Messaging *)
  | SecureMsg of Buf.t
  | ChannelList of (rank * Security.key) list
	
      (* interaction between Mflow, Pt2ptw, Pt2ptwp and the application *)
  | FlowBlock of rank option * bool

      (* Signature/Verification with Auth *)
  | AuthData of Addr.set * Auth.data

      (* Information passing between optimized rekey layers *)
  | Tree    of bool * Tree.z
  | TreeAct of Tree.sent
  | AgreedKey of Security.key

      (* The channel list held by the SECCHAN layer *)
  | SecChannelList of Trans.rank list
  | SecStat of int              (* PERF figures for SECCHAN layer *)
  | RekeyFlag of bool           (* Do a cleanup or not *)

Figure 3: Fields for events. Taken from type/event.mli


4.2.1   Event Fields

4.3   Event fields and the ``types'' for which they are defined

[TODO]

4.4   Event Chains

We describe here common event sequences, or chains, in Ensemble. Event chains are sequences of alternate up and down events that ping-pong up and down the protocol stack bouncing between the two end-layers of the chain. The end layers are typically the the top and bottom-most layers in the stack (eg., TOP and BOT). The most common exceptions to this are the message chains (Sends and Broadcasts), which can have any layer for their top layer.

Note that these sequences are just prototypical. Necessarily, there are variations in which of layers see which parts of these sequences. For example, consider the Failure Chain in a virtual synchrony stack with the GMP layer. The Failure Chain begins at the coordinator with an ESuspect event initiated at any layer in the stack. The BOT layer bounces this up as an ESuspect event. The top-most layer usually responds with a EFail event. The EFail event passes down through all the layers until it gets to the GMP layer. The GMP layer at the coordinator both passes the EFail event to the layer below and passes down a ECast event (thereby beginning a Broadcast Chain...). At the coordinator, the EFail event bounces off of the BOT layer as an EFail event and then passes up to the top of the protocol stack. At the other members, an ECast event will be received at the GMP layer. The message is marked as a ``Fail'' message, so the GMP layers generate and send down an EFail event (just like the one at the coordinator) and this is also bounced off the BOT layer as an EFail event. The lesson here is that the different layers in the different members of the group all essentially saw the same Failure Chain, but exact sequencing was different. For example, the layers above the GMP layer at the members other than the coordinator did not see a EFail event. [TODO: give diagram]

[TODO: Leave Chain]

4.4.1   Timer Chain

Request for a timer, followed by an alarm timeout.
ETimer down: timeout requested, sent down to BOT.
ETimer up: alarm generated in BOT at or after requested time, and sent up.

4.4.2   Send Chain

Send a pt2pt message followed by stability detection.
ESend down: send a pt2pt message down.
ESend up: destinations receive the message
EStable message eventually becomes stable, and stability information is bounced off BOT.

4.4.3   Broadcast Chain

Broadcast of a message followed by stability detection.
ECast down: broadcast a message
ECast up: other members receive the broadcast
EStable broadcast eventually becomes stable, and stability information is bounced off BOT

4.4.4   Failure Chain

Suspicion and ``failure'' of group members.
ESuspect down: suspicion of failures generated at any layer
ESuspect up: notification of suspicion of failures
EFail down: coord fails suspects
EFail up: all members get failure notice

4.4.5   Block Chain

Blocking of a group prior to a membership change.
ESuspect/EMergeRequest up: reasons for coord blocking
EBlock down: coord starts blocking
EBlock up: all members get block notice
EBlockOk down: all members reply to block notice
EBlockOk up: coord get block OK notice
EMergeRequest EView down: coord begins Merge or View chain

4.4.6   View Chain

Installation of a new view, followed by garbage collection of the old view.
EView down: coord begins view chain (after failed merge or blocking)
EView up: all members get view notice
EExit down: protocol stacks are ready for garbage collection [todo]
EExit up: protocol stacks are garbage collected

4.4.7   Merge Chain (successful)

Partition A merges with partition B, followed by garbage collection of the old view. We focus on partition A and only give a subset of events in partition B.
EMergeRequest down: coord A begins merge chain (after blocking)
EMergeRequest up: coord B gets merge request
EMergeGranted down: coord B replies to merge request
EMergeGranted up: coord A gets merge OK notice
EView down: coord A installs new view for coord B
EView up: all members in group A get view notice
EExit down: protocol stacks are ready for garbage collection
EExit up: protocol stacks are garbage collected
[TODO: EExit above is currently ELeave]

4.4.8   Merge Chain (failed)

Failed merge, followed by installation of a view.
EMergeRequest down: coord begins merge chain (after blocking)
EMergeFailed or  
EMergeDenied up: coord detect merge problem
EView down: coord begins view chain

Previous Contents Next