5 Ensemble ML Application Interface
[TODO: add example handlers from mtalk]
We present a simple interface for building single-group applications. This
interface is intended to make small applications easy to build, and to protect
users from complications in the internals of the system.
The interface is implemented as a set of callbacks the application
provides to Ensemble. The application is notified through these
callbacks (in a similar fashion to callbacks with Motif widgets) of
events that occur in the system, such as message receipts and
membership changes.
The interface for a member of a group is always in one of two states,
blocked or unblocked. While unblocked, only the
recv_send, recv_cast, and heartbeat
callbacks are enabled. This is the normal state of the system. While
block, the application should refrain from sending messages. However,
it can send messages, causing the system to fail with the notification
``sending while blocked''.
Messages are sent by returning from these callbacks lists of actions to
take. An action is usually a message send: either a Cast (group
broadcast) or a Send (point-to-point message). Thus, messages are
delivered by callbacks from Ensemble and further messages are sent by
returning values from these callbacks.
5.1 Compilation
Compiling ML applications is easy. You can use demo/Makefile as a
skeleton for your own applications.
5.2 Interface Definition and Initialization
Below is the full ML interface type definition for the application
interface described here. A group member is initialized by creating
an interface record which defines a set of callback handlers for the
application. This is then passed to one of the Ensemble stack
initialization functions exported by appl/appl.mli.
(* Some type aliases.
*)
type rank = int
type view = Endpt.id list
type origin = rank
type dests = rank array
type control =
| Leave
| Prompt
| Suspect of rank list
| XferDone
| Rekey of bool
| Protocol of Proto.id
| Migrate of Addr.set
| Timeout of Time.t (* not supported *)
| Dump
| Block of bool (* not for casual use *)
| No_op
type ('cast_msg,'send_msg) action =
| Cast of 'cast_msg
| Send of dests * 'send_msg
| Send1 of rank * 'send_msg
| Control of control
(* APPL_INTF.New.full: The record interface for applications. An
* application must define all the following callbacks and
* put them in a record.
*)
type cast_or_send = C | S
type blocked = U | B
type 'msg naction = ('msg,'msg) action
type 'msg handlers =
flow_block : rank option * bool -> unit ;
block : unit -> 'msg naction array ;
heartbeat : Time.t -> 'msg naction array ;
receive : origin -> blocked -> cast_or_send -> 'msg -> 'msg naction array ;
disable : unit -> unit
type 'msg full =
heartbeat_rate : Time.t ;
install : View.full -> ('msg naction array) * ('msg handlers) ;
exit : unit -> unit
}
5.3 Actions
Some callbacks allow a (possibly empty) array of actions to be
returned. There are 4 different kinds of actions:
-
[Cast(msg)] : Causes msg to be broadcast to the group.
-
[Send(dests,msg)] : Causes msg to be sent to a subset of the
group specified in dests. dests is an array of ranks.
-
[Send1(dest,msg)] : Same as Send, but sends msg to a
single destination. This is slightly more efficient for single destinations.
-
[Control c] : This bundles together all control actions. There
are several of these:
-
[Leave] : Causes the member to leave the group. There should always
be at most one Leave action returned in an action array.
-
[Prompt] : Ask the system to perform a view-change immediately.
-
[XferDone] : Signals that this member has completed its state
transfer. If a state transfer layer is in the protocol stack, this
will trigger a new non-state transfer view after all members have
taken an XferDone action.
-
[Rekey opt] : Ask the system to rekey itself. This should be done in case
the current key may have been compromised, for example, if a
previously trusted member should be expelled. The opt
parameter describes whether previously constructed pt-2-pt session
keys can be used to optimize this operation, or whether this is
disallowd. For the casual user, the optimized version (opt = false)
should be used.
-
[Protocol(protocol)] : Requests a protocol switch. If the stack supports
protocol switches, a new view will be triggered.
-
[Dump] : Causes some debugging output to be printed by the stack in use.
The output depends greatly on the protocol stack.
-
The rest of the actions are not intended for the casual user, they
are either not supported, badly supported, or used by system internals.
5.4 The install callback
Whenever a new view is installed, the application install callback is
called. This handler describes several callbacks:
type 'msg handlers =
flow_block : rank option * bool -> unit ;
block : unit -> 'msg naction array ;
heartbeat : Time.t -> 'msg naction array ;
receive : origin -> blocked -> cast_or_send -> 'msg -> 'msg naction array ;
disable : unit -> unit
flow_block source onoff is called whenever there are flow control
issues. The onoff value describes whether communication on the
specific channel can resume, or should be held back momentarily until
communication problems are resolved. If the source is None,
then the problematic channel is multicast, if it is
Some(rank) then there are issues with the point-to-point
connection between this endpoint, and endpoint rank.
block () is called to notify the application to stop sending
messages, because a view change is pending. It is an error to send
messages from now on, until a new view is installed, and
install will be called again.
heartbeat current_time is regularly called by Ensemble when
the application is unblocked. The expected rate of heartbeats is
specified through the heartbeat_rate field of the interface
record. The return values for all of these callbacks is an action
array.
receive origin bk cs msg is called when a message
has been received. The callback is made with the origin of the
message, the current block state (bk), if this is a Cast of Send
message (cs) and the message itself.
The install callback is called with the current view state, it returns
a set of 5 handlers, and also a set of actions to be performed
immediatly. It is wrapped up in a structure bundling the heartbeat
rate, exit function (see below), and itself.
5.5 View state
Several callbacks receive as an argument a pair of records with
information about the new view. The information is split into two
parts, a View.state and a View.local record. The
first contains information that is common to all the members in the
view, such as the view of the group. The same record is
delivered to all members. The second record contains information
local to the member that receives it. These fields include the
Endpt.id of the member and its rank in the view. It
also contains information that is derived from the View.state
record, such as nmembers with is merely the length of the
view field.
(* VIEW.STATE: a record of information kept about views.
* This value should be common to all members in a view.
*)
type state =
(* Group information.
*)
version : Version.id ; (* version of Ensemble *)
group : Group.id ; (* name of group *)
proto_id : Proto.id ; (* id of protocol in use *)
coord : rank ; (* initial coordinator *)
ltime : ltime ; (* logical time of this view *)
primary : primary ; (* primary partition? (only w/some protocols) *)
groupd : bool ; (* using groupd server? *)
xfer_view : bool ; (* is this an XFER view? *)
key : Security.key ; (* keys in use *)
prev_ids : id list ; (* identifiers for prev. views *)
params : Param.tl ; (* parameters of protocols *)
uptime : Time.t ; (* time this group started *)
(* Per-member arrays.
*)
view : t ; (* members in the view *)
clients : bool Arrayf.t ; (* who are the clients in the group? *)
address : Addr.set Arrayf.t ; (* addresses of members *)
out_of_date : ltime Arrayf.t ; (* who is out of date *)
lwe : Endpt.id Arrayf.t Arrayf.t ; (* for light-weight endpoints *)
protos : bool Arrayf.t (* who is using protos server? *)
(* VIEW.LOCAL: information about a view that is particular to
* a member.
*)
type local =
endpt : Endpt.id ; (* endpoint id *)
addr : Addr.set ; (* my address *)
rank : rank ; (* rank in the view *)
name : string ; (* my string name *)
nmembers : nmembers ; (* # members in view *)
view_id : id ; (* unique id of this view *)
am_coord : bool ; (* rank = vs.coord? *)
falses : bool Arrayf.t ; (* all false: used to save space *)
zeroes : int Arrayf.t ; (* all zero: used to save space *)
loop : rank Arrayf.t ; (* ranks in a loop, skipping me *)
async : (Group.id * Endpt.id) (* info for finding async *)
(* LOCAL: create local record based view state and endpt.
*)
val local : debug -> Endpt.id -> state -> local
Most of the fields are moderately self-explanatory. If
xfer_view is true, then this view is only for state transfer
and all members should take an XferDone action when the state
transfer is complete. The view field is defined as View.t,
which is:
(* VIEW.T: an array of endpt id's.
*)
type t = Endpt.id Arrayf.t
5.6 Asynchronous operation
The application can only send messages when handling a callback.
Under some circumstances (such as when receiving input from another
source), it is necessary to send messages immediately rather than
waiting for the next regularly scheduled heartbeat to occur. Call the
function Appl.async with the group and endpoint of the group.
This returns a function that can be called whenever an immediate
hearbeat is desired. [This replaces the previous
heartbeat_now callback.]
let async = Appl.async (group,endpt) in
async ()
5.7 Exit notice
Called when the member has left the group (through a previous Leave
action). This is the last callback the group member will receive.
exit : unit -> unit ;
5.8 Properties
The Ensemble Property module is used to construct protocols based on
desired properties the application wants. You can look at appl/property.mli
for the various properties that are supported by Ensemble:
type id =
| Agree (* agreed (safe) delivery *)
| Gmp (* group-membership properties *)
| Sync (* view synchronization *)
| Total (* totally ordered messages *)
| Heal (* partition healing *)
| Switch (* protocol switching *)
| Auth (* authentication *)
| Causal (* causally ordered broadcasts *)
| Subcast (* subcast pt2pt messages *)
| Frag (* fragmentation-reassembly *)
| Debug (* adds debugging layers *)
| Scale (* scalability *)
| Xfer (* state transfer *)
| Cltsvr (* client-server management *)
| Suspect (* failure detection *)
| Flow (* flow control *)
| Migrate (* process migration *)
| Privacy (* encryption of application data *)
| Rekey (* support for rekeying the group *)
| OptRekey (* optimized rekeying protocol *)
| DiamRekey (* Diamond rekey algorithm *)
| Primary (* primary partition detection *)
| Local (* local delivery of messages *)
| Slander (* members share failure suspiciions *)
| Asym (* overcome asymmetry *)
(* The following are not normally used.
*)
| Drop (* randomized message dropping *)
| Pbcast (* Hack: just use pbcast prot. *)
| Zbcast (* Use Zbcast protocol. *)
| Gcast (* Use gcast protocol. *)
| Dbg (* on-line modification of network topology *)
| Dbgbatch (* batch mode network emulation *)
| P_pt2ptwp (* Use experimental pt2pt flow-control protocol *)
Here is a short description of some of the properties:
-
Gmp: Group Membership Properties.
- Sync: Synchronizes messages on view changes to ensure view synchrony.
- Total: Broadcast messages are totally ordered in the group.
- Heal: Group partitions are healed.
- Switch: Allows on-the-fly protocol switching.
- Auth: Allows only authenticated and authorized members into
the group. Creates secure agreement in the group on a mutual group
key. This key is used to sign and verify, using keyed-MD5, all group
messages. This protects the group from outisde attack.
- Rekey: Allows rekeying the group.
- Privacy: Encrypts all user messages.
- Causal: Broadcasts are causally ordered.
- Subcast: Point-to-point messages are sent using filtered broadcasts.
Guarantees FIFO ordering between broadcasts and point-to-point messages.
- Frag: Message fragmentation. Allows messages of any size to be sent.
- Debug: Inserts a variety of ``assertion'' protocols that check that
other properties are being met.
- Scale: Switches some protocols with more scalable versions.
- Xfer: Causes the state transfer field (xfer) of view states to
be set.
- Cltsvr: Causes the clients field of view states to be set according to
whether members are ``clients'' or ``servers''.
- Suspect: Members watch other members for suspected failures.
- Zbcast: A probabilistic multicast protocol, does not guaranty
virtual syncrhony. Has been used for experimental studies. See the
Cornell Spinglass system for more details.
- Gcast: A protocol that simulates IP-multicast useing a binary
tree of pt-2-pt connections between group members.
The Property.choose function selects a protocol stack based on a list
of desired properties (you can examine the implementation to see exactly how
this is done):
(* Create protocol with desired properties.
*)
val choose : id list -> Proto.id
The default properties used for Ensemble applications is Property.vsync.
This is one of a variety of predefined protocol property lists defined in the
Property module:
let vsync = [Gmp;Sync;Heal;Migrate;Switch;Frag;Suspect;Flow]
let total = vsync @ [Total]
let scale = vsync @ [Scale]
let fifo = [Frag]
In order to set the properties used by an application, you would use the
following code:
(* Choose default view state.
*)
let vs = Appl.default_info "my-appl" in
(* Select desired properties.
*)
let properties = [ (* list of properties *) ] in
(* Choose corresponding protocol stack.
*)
let proto_id = Property.choose properties in
(* Set proto_id of the view state record.
*)
let vs = View.set vs [Vs_proto_id proto_id] in
(* Configure the application
*)
Appl.config_new my_interface vs ;
As described in the reference manual, each of these protocols are derived by
combining a set of protocol layers together to get a full protocol stack with
application-level properties. Anyway, here we describe the behavior of the
vsync protocol stack.
-
The first callback a protocol stack receives is an
install with a singleton view.
-
All members in the same partition of a group receive the same
View.state records (excepting the rank field, of
course).
-
Send messages are delivered reliably and in FIFO order. It is
an error for a member to send a message to itself.
-
Cast messages are delivered reliably and in FIFO order. FIFO
order for Cast messages means that members receive the
messages in the order they were sent by the sender. Cast
messages are usually not delivered to the sender (the primary
exceptions are stacks with total-ordering layers in them).
-
There is no ordering relationship between Send and
Cast messages.
-
Messages are delivered in the same view they were sent in (the
protocol stack ``blocks'' so that the protocols can flush all the
current messages out of the system before advancing to the next view).
-
Cast messages are delivered atomically. This means that
either all members (excepting the sender) or none will receive a
Cast message. If the sender of a Cast message fails,
other members who received the message will retransmit it for the
failed member. When there is more than one member in a group, a
Cast message may be delivered to no members only if the sender
fails.
-
All members that receive the same consecutive views (they get the same
install upcalls will have delivered the same set of
Cast messages between the upcalls (but not necessarily in the
same order). Thus views can be considered as synchronization points
where all members agree on what has been done so far.
5.9 Initializing Ensemble Applications
This is a description of how simple applications are initialized with
Ensemble. The source code presented here is extracted from the
mtalk demo, which is distributed with Ensemble. The source
can be found in demo/mtalk.ml which compiles and links with
the Ensemble library to form the demo/mtalk executable.
An application consists of two parts, initialization and an interface.
The initialization involves setting up Ensemble and the
communication framework. An interface consists of a set of callback
handlers that manage application events that Ensemble generates for
messages and membership changes. The initialization code tends to be
similar across applications, and the handlers tend to contain most of
the application-specific functionality. We present a sample set of
initialization code, which can easily be adapted for other simple
applications. We do not describe the callback handlers here; they are
described in section 5. For specific examples,
see demo/mtalk.ml and demo/rand.ml.
let run () =
(*
* Parse command line arguments.
*)
Arge.parse [
(*
* Extra arguments can go here.
*)
] (Arge.badarg name) "mtalk: multiperson talk program" ;
(*
* Get default transport and alarm info.
*)
let view_state = Appl.default_info "mtalk" in
let alarm = Alarm.get_hack () in
The initialization must do several things, all of which can be
contained in a single function, as shown here with the function
run. First parse the command-line arguments as is done above.
In addition to arguments provided by the applicatoin, this parses the
standard Ensemble arguments. Then, default_info is called.
This initializes a View.state record (which contains all the
information other modules need to initialize your application).
(*
* Choose a string name for this member. Usually
* this is "userlogin@host".
*)
let name =
try
let host = gethostname () in
(* Get a prettier name if possible.
*)
let host = string_of_inet (inet_of_string host) in
sprintf "%s@%s" (getlogin ()) host
with _ -> view_state.name
in
(*
* Initialize the application interface.
*)
let interface = intf name alarm in
Next we initialize the interface record that contains the
application's handlers and which does the actual work of the
application. How the interface is initialized is application
dependent. For example, interface will usually require
several arguments. In the mtalk application, the interface
takes the endpoint identifier of the application and a string name to
use for this member of the talk group. Other applications will use
different arguments.
(*
* Initialize the protocol stack, using the interface and
* view state chosen above.
*)
Appl.config_new interface view_state ;
The code above initializes the protocol stack. In this case we use
the vsync protocol properties, which provide FIFO,
virtually-synchronous communication and an automatic merging facility
for healing partitions. There are several different sets of
properties by the appl/property.mli module, each of which
provides different properties or performance characteristics (for
more information about properties, see section 5.8).
(*
* Enter a main loop
*)
Appl.main_loop ()
(* end of run function *)
(* Run the application, with exception handlers to catch any
* problems that might occur.
*)
let _ = Appl.exec ["mtalk"] run
The initialization is complete and we enter a main loop. The main
loop never returns. The final code calls the run function
with some standard exception handlers to catch any exceptions that
should not, but may, occur.
This is all that is required for initializing simple, single-group Ensemble
applications.