7 Native C Ensemble Application Interface (CE)
The C application interface is very similar in design to the ML
interface. It is located in directory ce. It has been
modified from the original ML interface, so as to fit better into
the C language (type-system and native data structures).
There are seven callbacks a C application needs to define in order
to work with Ensemble. These are:
-
install(env,ls,vs) : called whenever a new view is installed.
-
exit() :called when the member leaves.
-
receive_cast(env, origin, num, iovl) :
called with the origin, an iovec array (and its length)
whenever a mulicast message arrives.
-
receive_send(env, origin, num, iovl) :
called with the origin, an iovec array (and its length)
whenever a point-to-point message arrives.
-
flow_block(env, origin, onoff) :
called whenever there are flow-control problems, and
the application should refrain from sending messages until further
notice.
-
block(env) :
called whenever a view change is forthcoming. All
applications are blocked, the old view is stabilized,
cleaned, and way is made for the new view.
-
heartbeat(env, time) :
called every timeout. The timeout is specified in the jops
structure. Timers are not exact, this callback may be called at
inaccurate times, or more often than neccessary. If accuracy is
required, the application should check the time argument.
The environment argument which is the first argument in all seven
callbacks is registered when a C-application interface is created.
The types of the callbacks are as follows:
typedef int ce_rank_t ;
typedef int ce_len_t ;
typedef void *ce_env_t ;
typedef double ce_time_t ;
typedef void (*ce_appl_install_t)(ce_env_t, ce_local_state_t*, ce_view_state_t*);
typedef void (*ce_appl_exit_t)(ce_env_t) ;
typedef void (*ce_appl_receive_cast_t)(ce_env_t, ce_rank_t, int, ce_iovec_array_t) ;
typedef void (*ce_appl_receive_send_t)(ce_env_t, ce_rank_t, int, ce_iovec_array_t) ;
typedef void (*ce_appl_flow_block_t)(ce_env_t, ce_rank_t, ce_bool_t) ;
typedef void (*ce_appl_block_t)(ce_env_t) ;
typedef void (*ce_appl_heartbeat_t)(ce_env_t, ce_time_t) ;
A ce_appl_intf_t is the type of a C application interface
(cappl). It can be created by the constructor ce_create_intf. There is no need for a destructor because Ensemble
frees the interface-structure and all related memory after the exit
callback is invoked. An application interface is opaque, it can be
used to create and endpoint, and join a group. It cannot be used to
join more than a single group.
typedef struct ce_appl_intf_t ce_appl_intf_t ;
The constructor takes the above handlers as parameters, as well as
an environment variable.
ce_appl_intf_t*
ce_create_intf(
ce_env_t env,
ce_appl_exit_t exit,
ce_appl_install_t install,
ce_appl_flow_block_t flow_block,
ce_appl_block_t block,
ce_appl_receive_cast_t cast,
ce_appl_receive_send_t send,
ce_appl_heartbeat_t heartbeat
);
The initial operation used to initiate a CE application is
ce_Init. It initializes the internal Ensemble data structures, and
processes command line arguments.
void ce_Init(int argc, char **argv) ;
After a C application completes initialization it should pass control
the Ensemble main loop via ce_Main_loop.
void ce_Main_loop ();
In order to Join a group, the ce_Join operation should be used.
void ce_Join(ce_jops_t *ops, ce_appl_intf_t *c_appl) ;
7.1 Group operations
Similarly to the ML interface, the set of supported operations is:
Leave, Cast, Send, Send1, Prompt, Suspect, XferDone, Rekey,
ChangeProtocol, and ChangeProperties. Messages are arrays of
IO-vectors (iovecs), or C memory chunks. The application can
send and receive iovec-arrays.
Multicast an iovec-array to the group.
void ce_Cast(
ce_appl_intf_t *c_appl,
int num,
ce_iovec_array_t iovl
) ;
Send a point-to-point message to a set of group members.
void ce_Send(
ce_appl_intf_t *c_appl,
int num_dests,
ce_rank_array_t dests,
int num,
ce_iovec_array_t iovl
) ;
Send a point-to-point message to the specified group member.
void ce_Send1(
ce_appl_intf_t *c_appl,
ce_rank_t dest,
int num,
ce_iovec_array_t iovl
) ;
The control actions, are the same as the ML actions.
Leave a group. Following this downcall, exit will be called,
freeing the cappl.
void ce_Leave(ce_appl_intf_t *c_appl) ;
Ask for a new View.
void ce_Prompt(
ce_appl_intf_t *c_appl
);
Report specified group members as failure-suspected.
void ce_Suspect(
ce_appl_intf_t *c_appl,
int num,
ce_rank_array_t suspects
);
Inform Ensemble that the state-transfer is complete.
void ce_XferDone(
ce_appl_intf_t *c_appl
) ;
Ask the system to rekey.
void ce_Rekey(
ce_appl_intf_t *c_appl
) ;
Request a protocol change. The protocol_name is a string
specifying the exact set of layers to use. The string is a colon
separated list of layers, for example:
Top:Heal:Switch:Leave:Inter:Intra:Elect:Merge:Sync:Suspect:Stable:
Vsync:Frag_Abv:Top_appl:Frag:Pt2ptw:Mflow:Pt2pt:Mnak:Bottom
void ce_ChangeProtocol(
ce_appl_intf_t *c_appl,
char *protocol_name
) ;
Request a protocol change, specifying properties.
properties is a string containing a colon separated list of
properties. For example:
"Gmp:Sync:Heal:Switch:Frag:Suspect:Flow:Xfer".
The system deduces a protocol stack that abides by these properties.
void ce_ChangeProperties(
ce_appl_intf_t *c_appl,
char *properties
) ;
7.2 Integration of other sockets into the main loop
Ensemble works in an event driven fashion, where events can either
come from the network or the user. The system runs a loop that is
split between (1) waiting for input on incoming sockets using a
select system call (2) Processing local
application send/recv and internal events.
The application hands over control to Ensemble after initialization.
The application may wish to wait on its own sockets, e.g., stdin (on
Unix). To this end, we also support adding, removing, and putting
handlers on sockets.
ce_handler_t is the type of handler called when there is input
to process on a socket.
typedef void (*ce_handler_t)(void*);
ce_AddSockRecv adds a socket to the list Ensemble listens to.
When input on the socket occurs, this handler will be invoked
on the specified environment variable.
void ce_AdddSockRecv(
CE_SOCKET socket,
ce_handler_t handler,
ce_env_t env
);
ce_RmvSockRecv is called to remove a socket from the list
Ensemble listens to.
void ce_RmvSockRecv(
CE_SOCKET socket
);
7.3 Memory management
The convention used throughout is that all
data-structures passed from C to ML are consumed by ML, and all
data-structures passed from ML to C are owned by the C side (hence
must be freed). This rule holds for all structures and data apart from
the iovec-arrays.
Ensemble does not copy messages from C to the ML heap, rather, it
separates C-memory and ML memory completely. Messages are received
from the network and read directly into C-buffers. Sent iovecs are
fragmented and sent directly on the network. Messages must be buffered
until all group members reliably receive them. To this end, a
reference counting scheme is used to track iovec liveness. When an
iovec's reference count reaches zero, it is freed. In other words,
iovec's are owned by Ensemble. They are received either from the
user, or the network.
On linux, the type of an iovec is:
typedef struct iovec ce_iovec_t ;
typedef ce_iovec_t *ce_iovec_array_t;
To get better control of the iovec memory system, the alloc and
free functions can be set by the user. The definitions are in
lib/mm.h.
These define the types of alloc and free functions.
typedef void* (*mm_alloc_t)(int);
typedef void (*mm_free_t)(char*);
The actual functions called to free and allocate iovec's.
mm_alloc_t mm_alloc_fun;
mm_free_t mm_free_fun;
Use these functions to set alloc and free. Be careful to
do this exactly once at application initialization, before
starting Ensemble.
void set_alloc_fun(mm_alloc_t f);
void set_free_fun(mm_free_t f);
The upshot of this is that when a user sends or casts a message,
Ensemble takes over the message body. When a message is
delivered to the application, the user may copy it, or perform any
read-only operation while in the receive callback. The application may
not modify a received iovec, or assume it owns it.
7.4 The flat interface
Using iovecs is a little complex for simple applications,
therefore, a simplified ``flat'' interface exists.
The flat_receive callbacks take a C memory chunk, with it's length as
arguments. This releases the application from merging together the
set of buffers that consist an iovec-array, as well as releasing that
array.
typedef void (*ce_appl_flat_receive_cast_t)(ce_env_t, ce_rank_t, ce_len_t, ce_data_t) ;
typedef void (*ce_appl_flat_receive_send_t)(ce_env_t, ce_rank_t, ce_len_t, ce_data_t) ;
Create a standard application interface using flat receive callbacks.
ce_appl_intf_t*
ce_create_flat_intf(
ce_env_t env,
ce_appl_exit_t exit,
ce_appl_install_t install,
ce_appl_flow_block_t flow_block,
ce_appl_block_t block,
ce_appl_flat_receive_cast_t cast,
ce_appl_flat_receive_send_t send,
ce_appl_heartbeat_t heartbeat
);
Cast and Send operations that work with buffers instead of iovec-arrays.
void ce_flat_Cast(
ce_appl_intf_t *c_appl,
ce_len_t len,
ce_data_t buf
) ;
void ce_flat_Send(
ce_appl_intf_t *c_appl,
int num_dests,
ce_rank_array_t dests,
ce_len_t len,
ce_data_t buf
) ;
void ce_flat_Send1(
ce_appl_intf_t *c_appl,
ce_rank_t dest,
ce_len_t len,
ce_data_t buf
) ;
7.5 An example
This section shows how to use the CE interface to write applications.
We walk through the ce/ce_mtalk.c demo program.
ce/ce_mtalk.c, similarly to demo/mtalk.ml,
is a multi-person talk program. Messages are read from the user via stdin, and multicasted to the network.
state_t is the state structure used by the program. It is the
environment variable registered in the C-interface. The state contains
the current view information, a pointer to its cappl, and a flag
indicating if we are blocked.
typedef struct state_t {
ce_local_state_t *ls;
ce_view_state_t *vs;
ce_appl_intf_t *intf ;
int blocked;
} state_t;
A helper function to multicast a message if we are not blocked.
We use the flat interface, to save the messy handling of iovec's.
void cast(state_t *s, char *msg){
if (s->blocked == 0)
ce_flat_Cast(s->intf, strlen(msg), msg);
}
A handler for stdin. This callback is called whenever there is input
on the socket. The handler multicasts any message the user types on the
screen. Be careful not to send messages if we are blocked.
void stdin_handler(void *env) {
state_t *s = (state_t*)env;
char buf[100], *tmp;
int len ;
fgets(buf, 100, stdin);
len = strlen(buf);
if (len>=100)
/* string too long, dumping it.
*/
return;
tmp = ce_copy_string(buf);
TRACE2("Read %s:", tmp);
cast(s, tmp);
}
There is nothing special to do if we leave the group, the application
essentially halts.
void main_exit(void *env)
When a new view arrives, update the environment structure. Do not
forget to free the old view structure.
void main_install(void *env, ce_local_state_t *ls, ce_view_state_t *vs) {
state_t *s = (state_t*) env;
ce_view_full_free(s->ls,s->vs);
s->ls = ls;
s->vs = vs;
s->blocked =0;
printf("%s nmembers=%d", ls->endpt, ls->nmembers);
}
Ignore flow control problems. We are not suppose to have any of
these, we are very low bandwidth.
void main_flow_block(void *env, ce_rank_t rank, ce_bool_t onoff)
Mark our blocked flag.
void main_block(void *env) {
state_t *s = (state_t*) env;
s->blocked=1;
}
Print out any message that we receive. Be careful not to free the
received message.
void main_recv_cast(void *env, int rank, ce_len_t len, char *msg) {
state_t *s = (state_t*) env;
printf("recv_cast <- %d msg=%s", rank, msg);
}
Ignore send messages, we are not supposed to get any of these.
void main_recv_send(void *env, int rank, ce_len_t len, char *msg) {
}
Ignore heartbeats.
void main_heartbeat(void *env, double time) { }
Create a join options structure, and join the group ``ce_mtalk''.
Use a regular virtually-synchronous stack. Put a handler on stdin such that whenever there is input, it will be called.
There is no need to set the transport in the join-options structure,
the system uses the environment variable ENS_MODES in this case.
void join() {
ce_jops_t *jops;
ce_appl_intf_t *main_intf;
state_t *s;
/* The rest of the fields should be zero. The
* conversion code should be able to handle this.
*/
jops = record_create(ce_jops_t*, jops);
record_clear(jops);
jops->hrtbt_rate=10.0;
// jops->transports = ce_copy_string("UDP");
jops->group_name = ce_copy_string("ce_mtalk");
jops->properties = ce_copy_string(CE_DEFAULT_PROPERTIES);
jops->use_properties = 1;
s = (state_t*) record_create(state_t*, s);
record_clear(s);
main_intf = ce_create_flat_intf(s,
main_exit, main_install, main_flow_block,
main_block, main_recv_cast, main_recv_send,
main_heartbeat);
s->intf= main_intf;
ce_Join (&jops, main_intf);
ce_AddSockRecv(0, stdin_handler, s);
}
The main entry point, initialize the ML side, process command line
arguments, join the ce_mtalk group, and turn control over
to the Ensemble event loop.
int main(int argc, char **argv) {
ce_Init(argc, argv); /* Call Arge.parse, and appl_process_args */
join();
ce_Main_loop ();
return 0;
}