In this section we look at a typical stack, namely
TOTAL:MBRSHIP:FRAG:NAK:COM:ATM. In this stack, COM provides unreliable
communication over a low-level network of choice; ATM was selected
in the example. NAK provides FIFO
ordering using a sequence number, FRAG provides fragmentation and
reassembly of large messages, MBRSHIP provides virtually synchronous
communication with respect to group membership, and TOTAL provides
totally ordered communication within group memberships.
If we know that ATM only provides property of
Table 4, then we can quickly find from
Table 3 that this stack results in the
properties
,
,
,
,
,
,
,
, and
. This section will visit each of these
layers in turn and clarify why these properties are obtained.
The COM, NAK, and FRAG layers do not provide consistent views. A view at these layers is nothing but the set of destination endpoints for multicast messages. The COM layer translates the low-level network interface into the Common Protocol Interface. If necessary, COM keeps track of the source of messages (by pushing the address of the source endpoint on each outgoing message), and filters out spurious messages from endpoints not in its view.
The NAK layer provides FIFO ordering of messages. For this it pushes a sequence number on each outgoing message, that the receiver can check. If the receiver detects message loss, it sends back a negative acknowledgement (NAK). The NAK layer buffers some messages for retransmission, and will retransmit if the message is still buffered. If not, it will send a place holder that will result in a LOST_MESSAGE event when received. Each endpoint will occasionally multicast its protocol status, so buffered messages may be flushed, and window-based flow control may be implemented. It also allows the detection of failures or disconnections (in case a status update is not received in time).
Table 4: A list of protocol properties, each of which can either be a
requirement on the communication guarantees provided underneath the protocol,
or a guarantee that is provided by the protocol itself.
The FRAG layer provides fragmentation and reassembly of large messages. Typical networks have a limit on the size of messages they can transmit. When a user of the FRAG layer attempts to send a message that is larger than that maximum size, the FRAG layer splits the message into multiple fragments. On each fragment the FRAG layer pushes a boolean value that indicates whether it is the last one or not. The FRAG layer depends on FIFO ordering for reassembly. When the last fragment is received, it delivers the message.
The MBRSHIP layer has been discussed in the previous section. It adds strong semantics to the VIEW upcall, that is, it guarantees that all members in the view that were also in the previous view have delivered the same messages. It relies on the FIFO ordering provided by the NAK layer, and on the FRAG layer for sending large messages.
The TOTAL layer, in turn, relies on virtually synchronous communication. During normal operation, it utilizes a token. A special ``oracle'' at each member decides who should get the token next. The oracle cannot always make the optimal decision for minimal overhead, but the protocol that the TOTAL layer uses comes close in many cases. In case of a failure, the token may be lost. This, however, is not a problem. During the flush, all members that did not get the token in time send their messages. These messages are not delivered, but buffered. When the new view is installed, each member that remains connected to the system is guaranteed to have all messages from the previous view, and a deterministic order can easily be constructed (e.g., messages are delivered in the order of the rank of the source). Another deterministic rule decides who the first token holder in this view is (e.g., the lowest ranked member), and normal operation can continue.
Interestingly, the TOTAL layer does not require direct interaction with a failure detector. As providing totally ordered communication is equivalent to the consensus problem, this seems contrary to the impossibility proof of [7]. TOTAL works nevertheless, for two reasons. First, the semantics that the TOTAL layer provides are slightly weaker, since it only guarantees timely delivery to the surviving members in the view. Second, failure information is provided by the MBRSHIP layer in the form of view updates.