next up previous contents
Next: Appendix Up: Implementing view changes Previous: STABLE

Interaction between protocol layers implementing view changes

The view change interaction between the four layers described above is shown in fig. 6.1.


  
Figure 6.1: Protocol interaction for a view change
\begin{figure}
\center{\epsfig{file=/home/bba/JavaGroups/Papers/UsersGuide/figs/ProtInteraction.eps,width=.55\textwidth} }
\end{figure}

The RpcGMS layer triggers the view change protocol when it (a) receives a join, leave or suspect message and it (b) is the current coordinator (a designated member taking care of view changes, usually the oldest member). It first starts the FLUSH protocol by sending down a FLUSH event. If no FLUSH layer is present, then RpcGMS will time out and continue without running the FLUSH protocol (it will just broadcast the new view).

When the FLUSH layer receives the FLUSH event, it broadcasts a FLUSH message to all members (including itself). This causes each member to send pending messages and then block sending further messages (until a new view has been received). To do so, each member sends a BLOCK event up the stack. At the top level, the channel notifies the application that it is not supposed to send any more messages until it receives a view change. Messages sent nevertheless will simply be queued until the new view has been installed, and then delivered in the next view. Subsequently, a BLOCK_OK event is generated in response and sent down the stack. When the FLUSH layer receives this event, it is now ready to proceed: it has to collect all unstable messages and return them to the coordinator (which broadcast the FLUSH message to it). To collect unstable messages, it asks the NAKACK layer underneath it by sending down event GET_UNSTABLE_MSGS. The NAKACK layer periodically receives a STABLE event from underneath it (from the STABLE layer) and removes all stable messages from its 'sent messages' store. It therefore essentially returns all messages in 'sent messages'6.2. These unstable messages are returned in a GET_UNSTABLE_MSGS_OK event. When the FLUSH layer receives this event it returns the unstable messages to the coordinator. The coordinator now merges all unstable messages returned from all members (removes duplicates) and re-sends them by delegating the resending to the NAKACK layer underneath by sending down a REBROADCAST_MSGS event.

When the NAKACK layer receives the REBROADCAST_MSGS event, it does the following: for each message in the list, the message is wrapped into a new message and broadcast to the group. When such a message is received, it is unwrapped and sent up the stack. Typically most resent messages will just be discarded because they have already been seen, but in a case where for example the sender crashed just after sending the message to only one member, all other members will receive this message, resulting in uniform delivery. When all messages have been resent, the NAKACK layer returns a REBROADCAST_MSGS_OK event which is caught by the RpcGMS layer.

Only when all unstable messages have been resent, can the RpcGMS layer proceed to install the new view in all members. It does so by broadcasting a VIEW_CHANGE message using an ACK scheme (to ensure all members have received the view). When the view is received, a member is again allowed to send messages to the group. Messages that may have been sent during the block and were therefore queued will now be delivered in the new view.


next up previous contents
Next: Appendix Up: Implementing view changes Previous: STABLE

1999-12-13