Previous Contents Next

10   Heterogeneous Transports

Ensemble provides a flexible infrastructure for sending communication across a variety of different communication transports. Not only can different groups use different communication transports, but a single group can support communication on multiple transports at the same time. The design of the transport module is split into three parts:
The socket module:
 
Low-level system calls: send, sendto, recv etc., implemented in a system-independent fashion. The socket directory contains the code. socket/u is a simple-minded implementation that uses the Ocaml Unix library directly. A more efficient version is located in socket/s, where native OS io-vector send/recv facilities are used.
Transports:
 
Self registering transports: Deering, UDP, TCP, NETSIM. These use the low-level socket module calls to provide an abstract transport.
Routers:
 
Uses a communication transport to build Ensemble specific send/recv capabilities. Length field, group id, and endpoint rank are added to each outgoing message. Basic parsing is performed on received messages and sender rank, group, and message length are extracted. There are several routers in the route subdirectory. signed.ml adds a 16-byte MD5 checksum to each outgoing message. An agreed group-secret is used to key MD5, providing group authentication. Incoming messages are stripped of this header, and verified. unsigned.ml is the vanilla router.
The user can choose to use either one of the socket module implementations. The socket module interface is defined in socket/socket.mli. The unoptimized socket implementation (usocket) represents message data as a Caml string and benefits from native garbage collection. Its disadvantage is reduced performance. The optimized socket library (ssocket) uses native C io-vectors, and native operating-system scatter-gather message send/receive facilities. This provides much better performance, and zero-copy integration with C applications. The disadvantage is more difficult integration with native ML values. The transports are defined the trans subdirectory. UDP in trans/udp.ml, TCP in trans/tcp.ml, DEERING in trans/ipmc, and NETSIM in trans/netsim. The route subdirectory contains three routes: signed, unsigned, and bypass.

10.1   Code walk-through

To provide better understanding of the design this section walks through a configuration of the unsigned router, UDP transport, and optimized socket library. We shall start from the bottom and work our way up. In file socket/s/sendrecv.c, there is code for sending an array of C io-vectors and part of an ML string. The function takes five arguments:

value skt_sendtosv(
	value info_v,
	value prefix_v,
	value ofs_v,
	value len_v,
	value iova_v
) {
  int naddr=0, i, ret=0;
  ocaml_skt_t sock=0 ;
  skt_sendto_info_t *info ;

  info = skt_Sendto_info_val(info_v);
  send_msghdr.msg_iovlen = prefixed_gather(prefix_v, ofs_v, len_v, iova_v); 

  send_msghdr.msg_namelen = info->addrlen ;
  sock = info->sock ;
  naddr = info->naddr ;

  for (i=0;i<naddr;i++) {
    /* Send the message.  Assume we don't block or get interrupted.  
     */
    send_msghdr.msg_name = (char*) &info->sa[i] ;
    ret = sendmsg(sock, &send_msghdr, 0) ;
  }

  return Val_unit;
}

skt_sendtosv is hidden inside the socket library, and can safely be used using Socket.sendtosv. The sendto_info structure can be created from an array of target socket addresses, and a sending socket.

type sendto_info
val sendto_info : socket -> Unix.sockaddr array -> sendto_info

val sendtosv : sendto_info -> buf -> ofs -> len -> Basic_iov.t array -> unit

The Hsys module makes access to sendtovs safer, and changes its type:

  val sendtosv : sendto_info -> Buf.t -> ofs -> len -> Iovecl.t -> unit

(* Implementation *)
  Iovec.Priv.sendtosv info 
    (Buf.string_of buf) (Buf.int_of_len ofs) (Buf.int_of_len len) 
    (Iovecl.to_iovec_array iovl) 

Core Ensemble code, including the routers, does not use Socket calls directly. Rather, it uses the Hsys module which wraps all calls with a more type safe interface. Separate types are used for length, offset, io-vector, and buffer. The UDP implementation at trans/udp.ml uses Hsys in the transmit function called x.

  let x hdr ofs len iovl = 
    Hsys.sendtosv dests hdr ofs len iovl;
    Iovecl.free iovl

The io-vector array is freed after the message is transmitted. The reference count for an iovec-array is decremented on two occasions: (1) it is sent on the network (2) it is handed to an application, and the callback has completed. The iovec refcount is initially set to one when the application sends it, and it is henceforth incremented whenever a copy of it created. Ultimately, the refcount will be decremented when the stability detection protocol determines that all group members received the message.

10.2   Design of the routers

Many endpoints belonging to different groups can coexist in a single Ensemble process. Each endpoint is identified by its connection identifier. The internal representation of this id is given in module Conn:

type id = {
  version       : Version.id ;
  group 	: Group.id ;
  stack 	: Stack_id.t ;
  proto 	: Proto.id option ;
  view_id 	: View.id option ;
  sndr_mbr 	: sndr_mbr ;
  dest_mbr 	: dest_mbr ;
  dest_endpt 	: dest_endpt option
}

The id is mapped into a string using the Route.pack_of_conn function. Ensemble uses MD5 for this mapping. The probability of a collision, i.e., for two different endpoints to map onto a single string, is 2-64 which is sufficient for our purposes.

val pack_of_conn : Conn.id -> Buf.t

The purpose of the route module is to create a single interface to these various endpoints. The main type exported is handlers. This is essentially a large array holding the set of connection identifiers and the delivery function for each of them. When a message is received by the bottom-most part of the system, it is parsed by the socket code into an ML header that is a string, and the rest of the message which is received into a C-iovector. This information is later fed into the deliver function.

val deliver : handlers -> Buf.t -> Buf.ofs -> Buf.len -> Iovecl.t -> unit

Deliver takes the current set of handlers, and a message, figures out which endpoints need to receive this message and calls the appropriate handlers. A transmission function is abstracted as a type xmitf:

(* transmit an Ensemble packet, this includes the ML part, and a
 * user-land iovecl.
 *)
type xmitf = Buf.t -> Buf.ofs -> Buf.len -> Iovecl.t -> unit

The Router module has an API allowing the creation of send/recv functions for connection-ids. It also allows installing and deleting such functions. The unsigned router is a simple example of using this functionality to create the basic, insecure, router. It defines function f:

val f : unit -> 
  (Trans.rank -> Obj.t option -> Trans.seqno -> Iovecl.t -> unit) Route.t

This router will allow users to send (1) sender rank (2) ML object (3) sequence number and (4) a user iovector array. The body of the code calls Route.create where it mainly needs to define how it plans on handling blast and merge. Blast is how to send messages, merge is how to receive a message on behalf of several connection ids.
Previous Contents Next