Maestro Open Toolkit: Clients/Servers + State Transfer Interface

This document is a part of online Ensemble documentation, under Maestro Open Toolkit.

The state-transfer protocol of Maestro is implemented in the Maestro_ClSv class. The interface to state transfer provided in Maestro_ClSv is, however, at a low level, and leaves some details to be taken care of by the application. In particular, a joining server has to choose an old server from which to request the state, and it has to replay the request should the chosen server crash during state transfer. Also, a joining server may need to terminate state transfer and possibly restart it in some scenarios, which makes it necessary to be able to distinguish between different state-transfer transactions -- all of which would have to be implemented at the application level if using Maestro_ClSv directly. Alternatively, the application can be built above the Maestro_CSX class, which provides additional state-transfer functionality behind a higher-level interface.

Index of State Transfer Callbacks/Downcalls:

  • resetState
  • stateTransfer_Callback
  • getState (blocking)
  • getState (non-blocking)
  • askState_Callback
  • sendState
  • gotState_Callback
  • xferCanceled_Callback
  • xferDone
  • Code Examples:

  • Blocking (synchronous) state transfer (for multi-threaded mode only).
  • Non-blocking (asynchronous) state transfer

  • Maestro_CSX

    Maestro_CSX (a subclass of Maestro_ClSv) provides a higher-level interface to state transfer.

    It may happen (as a result of group partitioning and merging) that state transfer will be (re)started more than once at a given server. On the other hand, a state transfer may be terminated before completion (if all old servers crash or partition away during state transfer). Since it is possible that a new state transfer will be started before the completion of a previous one, Maestro_CSX assigns a unique ID to every state transfer transaction so that the application can distinguish between them.

    When state transfer needs to be (re)started, the stateTransfer_Callback method is invoked. If Maestro is run in the multi-threaded mode, stateTransfer_Callback is called in a separate thread. However, in the single-threaded mode stateTransfer_Callback is invoked in the same (Ensemble) thread as all other callbacks.

    A joining server can request (a portion of) the state form an old server with a getState downcall. There are two versions of getState, a blocking and a non-blocking one. In the multi-threaded mode, both versions of getState can be used. However, the non-blocking version of getState is the only choice when running Maestro in the single-threaded mode.

    A call to getState made at a joining server results in an invocation of the askState_Callback method at a normal ("old") server, which should eventually respond by sending a state message to the joining server with a call to the sendState function. When the joining server receives the state message, the corresponding call to getState returns (if the synchronous/blocking version of getState was called) or the gotState_Callback is invoked (in the asynchronous/non-blocking case).

    If a state-transfer transaction is terminated while the joining server is still waiting for a state message from an old server, the call to getState will return with the abnormal-termination status (in case of a blocking call), or else the xferCanceled_Callback method will be invoked (in case of a non-blocking call).

    Once the joining server has completed state transfer, it should invoke the xferDone method. Following that, a new view will eventually be installed, where the joining server will be included in the list of "normal" servers.

    The interface to the state transfer functionality provided by Maestro_CSX is described in sections below.


    resetState

    		void resetState();
  • This method is implemented as a no-op in the Maestro_CSX class. It can be overloaded to do application-specific state initialization at the beginning of executution and during (re)start of state-transfer.


    stateTransfer_Callback

    		void stateTransfer_Callback(Maestro_XferID &xferID);
  • This callback is invoked at a joining server when state transfer is (re)started. It is invoked in a separate thread if Maestro is running in the multi-threaded mode, and in the same (Ensemble) thread if Maestro is in the single-threaded mode. The argument, xferID, identifies the state transfer transaction. The default implementation of stateTransfer_Callback only invokes xferDone(xferID) (which completes the state transfer), and returns. The stateTransfer_Callback method should be overloaded in a subclass of Maestro_CSX to implement application-specific state transfer functionality.

    Upon completion of state transfer, the application must notify Maestro by calling xferDone with the same value of xferID as the one passed to the corresponding invocation of stateTransfer_Callback.


    getState (blocking version)

    		void getState(Maestro_XferID &xferID, 
    			      Maestro_Message &requestMsg,
    			      /*OUT*/ Maestro_Message &stateMsg,
    			      /*OUT*/ Maestro_XferStatus &xferStatus);
  • This blocking method is invoked by a joining server (usually from within stateTransfer_Callback) to request a portion of the state from one of normal (old) servers. The application can make as many calls to getState as necessary. However, the getState function is not reentrant. Also, in all invocations of getState, the value of the xferID argument must be equal to the value passed to the corresponding invocation of stateTransfer_Callback. The requestMsg argument contains the request message specifying which portion of the state is being requested. If getState returns with a success value (see below), stateMsg contains the reply message with the requested part of the state in it.

    When a call to getState returns, the value of the xferStatus argument will be equal to MAESTRO_XFER_OK if the state request has succeeded, and MAESTRO_XFER_TERMINATED if the transfer has been prematurely terminated (usually because of a group merge or a total failure/partitioning away of all normal servers). If state transfer has been terminated, stateTransfer_Callback should return without further attempts to get the state.

    The blocking version of getState can only be used when Maestro is running in the multi-threaded mode. In the single-threaded mode, the non-blocking version of getState must be used.


    getState (non-blocking version)

    		void getState(Maestro_XferID &xferID, 
    			      Maestro_Message &requestMsg,
    			      /*OUT*/ Maestro_XferStatus &xferStatus);
  • This non-blocking method is invoked by a joining server to request a portion of the state from one of normal (old) servers. The requestMsg argument contains the request message specifying which portion of the state is being requested. Following a call to getState, the gotState_Callback method will eventually be invoked when the requested portion of the state has been successfully received from an old server. However, if the state-transfer transaction is canceled (for any reasons) before the state is received, the xferCanceled_Callback method will be invoked instead of gotState_Callback.

    The application can make as many calls to getState as necessary. However, the getState function is not reentrant. Furthermore, after a call to (non-blocking) getState has been made, a subsequent invocation of the function can only be made after the portion of the state requested in the previous call to getState has been received (with a matching invocation of the gotState_Callback method).

    Also note that in all invocations of getState, the value of the xferID argument must be the ID of the current state-transfer transaction (which is the value of the xferID argument passed to the corresponding invocation of stateTransfer_Callback).

    The non-blocking (asynchronous) version of getState can be used when Maestro is running in either multi-threaded mode or single-threaded mode, and is the only option in the latter case. However, in the multi-threaded mode, the blocking (synchronous) version of getState can also be used.


    askState_Callback

    		void askState_Callback(Maestro_EndpID &origin, 
    				       Maestro_XferID &xferID,
    				       Maestro_Message &requestMsg);
  • This callback method is invoked at a normal (old) server when a state request from a joining server arrives (as a result of calling the blocking or non-blocking getState function by the joining server). The origin argument is the endpoint ID of the new server requesting the state; xferID identifies the state transfer transaction; requestMsg contains a message from the joining server specifying which portion of the state is being requested.

    Each invocation of askState_Callback must eventually be followed by a call to the sendState function, which sends (the requested portion of) the state to the joining server.


    sendState

    		void sendState(Maestro_EndpID &dest,
    			       Maestro_XferID &xferID, 
    			       Maestro_Message &stateMsg);
  • This function must be eventually called after every invocation of askState_Callback at a "normal" (old) server. The value of the dest argument must be equal to the value of the origin argument in the corresponding invocation of askState_Callback. Similarly, the value of the xferID argument must be the same as that in askState_Callback. The stateMsg argument should contain the portion of the state requested with the corresponding invocation of askState_Callback.


    gotState_Callback

    		void gotState_Callback(Maestro_XferID &xferID,
    				       Maestro_Message &stateMsg);
  • This callback method is eventually invoked after a call to the non-blocking getState function, if the requested portion of a local state has been successfully received from an old server. The xferID argument identifies the state-transfer transaction. The stateMsg argument contains the requested state message.


    xferCanceled_Callback

    		void xferCanceled_Callback(Maestro_XferID &xferID);
  • This callback method is invoked after a call to the non-blocking getState function if the state-transfer transaction is aborted. The xferID argument identifies the state-transfer trasnsaction.


    xferDone

    		void xferDone(Maestro_XferID &xferID);
  • This method must be called by a joining server when its state transfer has been completed. The value of the xferID argument must be equal to the ID of the current state-transfer transaction (which is the value of xferID passed in the corresponding call to stateTransfer_Callback).


    send mail to alexey@cs.cornell.edu