Next: Using deadlock detection with Up: Advanced Concepts Previous: Advanced Concepts

Avoiding deadlocks using synchronous group method invocations

Synchronous group messages / method calls are a powerful tool: they raise the peer protocol abstraction level to that of normal method invocations. Protocols can therefore be described as a set of simple procedures (methods); these procedures can be invoked remotely, and procedure bodies contain the protocol logic. Not having to write complex code for low-level tasks such as request/response correlation makes the resulting code more clear and shorter. Looking at a protocol layer as a class with methods makes understanding it much easier. Thus, an interaction between two peer protocols boils down to a remote method invocation.

However, as with all types of concurrent and distributed processing, there is the danger of deadlocks. Aggravating the problem is the requirement of ordering in group communication settings: a method invocation caused by the reception of a message may not be 'passed' by another method invocation. Therefore, unlike in modern object request brokers (ORBs), where every request is served on a separate thread (taken from a thread pool) which avoids deadlock in most cases, we cannot introduce concurrency to handle requests, since this would adversely affect ordering guaranteed by the protocol stack^5.1.

To illustrate the problem, consider the protocol interaction typically performed in a GMS layer when a new client joins a group, as shown in fig. 5.1.

**Figure 5.1:** Client joining a group
$\begin{figure} \center{\epsfig{file=/home/bba/JavaGroups/Papers/UsersGuide/figs/JoinInteraction.eps,width=.35\textwidth} } \end{figure}$

Method Join would be invoked by a client in the coordinator, who would join the new member and then invoke View on all members. In the example, P would be the new member, and Q the current coordinator (Q and R form the existing group). To join the group, P invokes a unicast synchronous Join in the coordinator. The method returns true if the member has been joined successfully, otherwise false, in which case the client would have to retry until it is finally successful. When a Join method is invoked in a coordinator (1), it adds the new member to its local view and invokes a synchronous View method in all members (2). When the View method invocation has returned (3) from all members, then the Join method returns (4). The 2 method invocations in the example have been chosen to be synchronous because return signals successful method invocation (a one-way method invocation does not return) and, in the case of Join, the return value (false or true) is needed to decide whether the join was successful.

However, choosing the 2 methods to be synchronous leads to disaster in our protocol stack, which delivers the messages (hence the method invocations) in FIFO order: when the Join method is invoked synchronously in the coordinator, and the coordinator multicasts a synchronous View method invocation to all members (including itself !), the coordinator is blocked waiting on the View method invocation to itself to return (which waits to be processed). This is shown in fig. 5.2.

**Figure 5.2:** Request handler queue in a deadlock situation
$\begin{figure} \center{\epsfig{file=/home/bba/JavaGroups/Papers/UsersGuide/figs/Deadlock.eps,width=.45\textwidth} } \end{figure}$

As the GMS protocol is based on RpcProtocol, there are two queues: one for storing incoming methods and one for sending outgoing methods. When a request (method invocation) is received it is added to the up queue. A request handler thread continually retrieves methods from the up queue and invokes them synchronously, waiting for completion before handling the next method. In the example, the queues at the coordinator are shown. A client sent a synchronous Join method which, as it is now at the top of the queue, is removed from the queue and processed. The Join method at some point invokes a synchronous View on all members. To do so, a message containing the View method call is put in the down queue to be sent down the stack. Since the view is sent to all members, it will also be received by the sender: it is added to the up queue, waiting for the request handler thread to remove and process it. However, the request handler is still busy processing Join, waiting for it to return (because it is synchronous). Join in turn is waiting for the View methods it sent to all members to return: only if the View methods were all executed and returned, Join would return. This is a clear deadlock, caused by two synchronous method invocations. The thick lines in the example show the recursion causing the deadlock.

Note that this problem might not occur in multithreaded servers: since the View method would be executed on a different thread, Join would receive all responses and thus not block. However, as our protocol stack has to observe ordering, we can only allocate a single thread per request, ensuring FIFO order of request processing.

As can be seen, one has to be very careful when constructing method call chains containing synchronous method calls. A certain degree of recursion is always involved in distributed group communication systems, as requests sent to the group will always also be received by the sender (unless local delivery is turned off, cf. section 3.3.11).

As we can see, most deadlock problems occur when a message in the up queue blocks the request handler thread from processing other requests. The down queue is in most cases never a problem, as requests are just passed down the stack and the put on the network.

There are a number of workarounds to the above problems. First, careful design of protocols may reduce the number of synchronous method calls by simplifying the interactions between peer protocols. In the above example, the View method invocation may be made asynchronous, i.e. the CastView method would return immediately after sending out the View method invocations. In this case, no deadlock would occur. The purpose of having made View synchronous, namely to receive confirmation from each member that it received the view, can be implemented by having an acknowledgment/retransmission layer somewhere below the GMS layer: messages sent will always be received by all (non-faulty) members.

Another solution would be to have a separate thread invoke the View on the group: CastView would spawn a new thread (or reuse a thread from a thread pool) to handle the view installation and return after the thread has been created. This would cause Join to return without blocking, thus avoiding deadlock. However, introducing concurrency in this case leads to weaker guarantees about ordering between the view change notification and the return of the Join call: contrary to fig. 5.1, a joiner may return from the Join method before having received the view change. The client protocol would have to account for this behavior with additional code, adding more complexity.

JavaGroups contains a GMS protocol (JavaGroups.JavaStack.Protocols.GMS) that makes use of synchronous method invocations. However, a newer version is currently in work: JavaGroups.JavaStack.Protocols.RpcGMS is derived from RpcProtocol and uses (synchronous) method invocations to implement its functionality.

A third solution to the above problem is to enable the RequestCorrelator to detect deadlocks by setting a flag. This is described in the next section.

Next: Using deadlock detection with Up: Advanced Concepts Previous: Advanced Concepts

1999-08-19