State Transfer in Maestro

This document is a part of online Ensemble documentation, under Maestro Open Toolkit.

The state-transfer functionality is provided in Maestro by two classes: Maestro_ClSv implements a state-transfer protocol (which is discussed below) and offers a low-level interface to it. The Maestro_CSX class offers a higher-level interface to state transfer; users of Maestro will want to use it in most cases.

The implementation of state transfer in Maestro follows the "pull" paradigm, in which a joining server asks old server(s) for pieces of the global state, and decides by itself when state transfer has been completed. In the "push" paradigm (used in Isis), it is a responsibility of old servers to decide how to transfer their state to the joining server, and how to handle failures during state transfer. It appears that the "pull" approach is simpler to implement and is more flexible.

In sections below, we show how the state is transferred in two important situations (new server joining the group, and two group partitions merging), and discuss the state transfer protocol.


State Transfer: A Server Joins the Group

The typical scenario is a new process X joining the group as a server. This occurs in several stages. Initially, X joins as a client (Stage 1). It then sends a becomeServer request to the group coordinator (Stage 2), which flushes the group and installs a new view. In that view, X is included in the list of xfer-servers (Stage 3). From that moment on, X has all "rights" of normal servers, but no "responsibilities:"

  • While X is among xfer-servers, it will receive the same messages (casts and scasts) as normal servers.
  • However, X is not required to do any service as long as it is in the state-transfer stage.

    When X becomes an xfer-server, it starts state transfer. The state is transfered in a series of read requests to normal servers (Stage 4). When all the state has been received, X sends an xferDone message to the coordinator (stage 5). The coordinator flushes the group and installs a new view, in which X is included in the list of servers. At that point, X becomes a normal server.


    State Transfer: Two Group Partitions Merge

    There are different approaches to dealing with state merges and global state consistency issues in general. Appropriateness of specific policies is very much application dependent. However, Ensemble implements a state transfer protocol, and Maestro provides another one. Both protocols do not have built-in mechanisms for global state consistency. We are planning to port the primary-views layer from Horus to Ensemble and adapt Maestro to it. The state-transfer protocol of the primary-views layer is strong enough to guarantee global total ordering in a partitionable network. However, in this document we discuss the current implementation of state transfer in Maestro.

    When two partitions merge, the servers in one of them will formally lose their state and become "degraded" to the client status. At that point all degraded servers will restart state transfer from the other partition. It is a responsibility of the application to actually transfer the state and notify Maestro of state-transfer completion. The role of Maestro is to notify the application when state transfer is deemed necessary, and to specify the direction of the transfer. The state-transfer model of Maestro thus assumes that during a view merge there is a transfer of state from one partition to the others, rather than a merge of states from individual partitions. This model is justified for an important class of applications (namely, those requiring the primary-view execution). The state transfer protocol of Ensemble is more general in allowing merge of states from several partitions. We are planning to eventually extend the Maestro interface to support both state-merge and state-transfer models.

    Here is what happens when a group merge occurs. Initially, there are two partitions with possibly different global states (Stage 0). Eventually, the two partitions merge. When that happens, all members of one of the partitions (servers, xfer-servers, and clients) suddenly find themselves in the clients list (Stage 1). That triggers a restart of state transfer by all degraded servers (Stage 2). When state transfer completes, the servers list of the merged group includes servers from both partitions (Stage 3).


    State Transfer: The Protocol

    The aim of the state transfer protocol of Maestro is to provide some kind of convergence properties to the global state. The global state in this context means some (application-defined) data that is consistently maintained by a subset of the group members, which are called servers. When a new member joins the group and wants to become a server, it has to receive the current state of the group first. Among possible problems are:

  • Old servers may crash during state transfer (in particular, it may happen that all servers but the joining one crash);

  • New messages may be sent/received during state transfer, which may result in (potentially inconsistent) modifications to the global state;

  • The group may merge with another partition, with a different opinion on what the current state is.

    The implementation of state transfer in HOT attempts to deal with these problems in the following way. Each group member has a membership type, which is either Client or Server. After the join downcall is invoked, a group member is always in one of the following states:

    	JOINING
    	CLIENT_NORMAL
    	BECOMING_SERVER
    	SERVER_XFER
    	SERVER_XFER_DONE
    	SERVER_NORMAL
  • It is guaranteed that if the membership type is Client, then the state of the member will eventually become CLIENT_NORMAL, after which it will always be CLIENT_NORMAL.

  • If the membership type is Server, and state transfer option is NO_XFER, then it is guaranteed that the state of the member will eventually become SERVER_NORMAL. After that, whenever the state of the member becomes different from SERVER_NORMAL, it will eventually become SERVER_NORMAL again.

  • Suppose the membership type is Server, and state transfer option is other than NO_XFER. Suppose also that whenever the state of the member changes to SERVER_XFER (as specified by the VIEW upcall parameters), that is eventually followed by an xferDone downcall.

    Then it is guaranteed that the state of the member will eventually become SERVER_NORMAL. After that, whenever the state of the member becomes different from SERVER_NORMAL, it will eventually become SERVER_NORMAL again.

  • Note that even after a joining server has received the state and become a normal server, it may lose its server status as a result of a group merge. If that happens, the state transfer will be restarted automatically.

  • If a group merge occurs during state transfer, the transfer will be terminated and restarted again.

  • A state transfer will also be terminated if all normal servers crash before the transfer completes.
    send mail to alexey@cs.cornell.edu