next up previous contents
Next: ReusableThread Up: Utility classes Previous: View

   
SlidingWindow

To guarantee lossless delivery, a unique (increasing) sequence number (ID) is attached to every message. The combination of the message's sender address and its ID identifies every message uniquely. To overcome message loss, sender and receiver agree on some start ID, e.g. TCP connection establishment involves the sender and receiver exchanging their start sequence numbers in the SYN/ACK packets. Now the IDs of messages received from a sender S always have to monotonically increase: if message 56 was received from S, the next message has to be 57. If a gap is detected, the receiver sends a retransmission request to the sender of the message. Upon receiving a retransmission request, the sender resends the message with the requested ID. Being able to resend old messages requires the senders to store recently sent messages for some time, usually until it is know that the receiver(s) has/have received all messages below a certain ID. In this case the messages below this ID may be deleted. When a receiver only sends retransmission requests when a gap is detected, the mechanism is called negative acknowledgment NAK. NAK is used in systems where message loss is infrequent, so NAKs would only rarely have to be used to retransmit lost messages.

When communication links are noisy and loss rate high, ACK schemes are preferred: a sender has to receive an acknowledgment from each receiver of a message. If ACKs are not received, the message is resent until an ACK has been received.

The idea of sliding windows ([Tan89, ch 4.4] and [Bir96, 83-88]) is to keep track of the acknowledgements for each ID. However, a scheme in which a sender send a single message (e.g. to multiple receivers in a group) and then waits for all ACKs is to slow: a sender should be able to send a number of messages and a separate thread should receive ACKs, and resend messages with ACKs missing. The senders and receivers each maintain a window of messages for which no ACKs have been received: a window is essentially a sequence of message IDs, starting with a low water mark and bounded by a high water mark. Whenever an ACK is received, the low and high water marks are advanced by 1, this allows 1 more ACK to be received, therefore sliding the window 1 to the right. When the window is full, an ACK is either discarded, or some kind of flow control is used to throttle the sender until there is more space available.

Sliding windows usually start out with a given size, however, more sophisticated protocols will dynamically adapt the window size, trying to find an agreed-upon size between sender and receiver.

The characteristics of sliding windows used at the sender and receiver usually involve (but do not have to !) a) error correction (by retransmission), b) flow control and c) message ordering by sender (FIFO). The latter property can easily be incorporated in a sliding window protocol, but sometimes, it is preferred to be implemented as a separate protocol for easier maintenance / replaceability.

Class JavaGroups.SlidingWindow is based on the concept of sliding windows, however, it is based on NAKs, not ACKs. A receiver adds messages to the table, where they are inserted according to their sequence number (ID). A counter keeps track of the low water mark, i.e. the lowest sequence number. Method Remove tries to remove a message, which succeeds as long as there are messages whose sequence number is equal to the low water mark (incrementing the latter). E.g with sequence numbers 23 24 26 27, Remove would successfully remove the first 2 messages in turn before returning null. Only when the message with sequence number 25 has been added will Remove succeed to remove messages 25, 26 and 27 (advancing the low water mark to 28).

A separate retransmission thread is started in each sliding window object. Its task is to send NAKs to senders from which messages have not been received with increasing sequence numbers, e.g. in the above example (before receiving message 25), the retransmission thread might have sent a NAK for message 25 to its sender. To do so, it invokes method Retransmit in an object (implementing interface RetransmitCommand) which registered previously with the sliding window table. The retransmit command should send a retransmit request to the sender, causing the sender to resend the requested message. When a retransmit request to a sender has failed twice, the request will be multicast to all group members. When a group member receives a retransmission request of which it wasn't the sender, and it still has the message, it will retransmit the message to the requester (changing the message's sender field to that of the (apparently failed) original sender3.1. A SlidingWindow is configured with the sender from which messages are to be received (there is usually 1 sliding window per sender) and a retransmit command.

When a message has been received, it can be added to the sliding window table corresponding to the sender's address using method Add. If the message's sequence number is smaller than the table's low water mark, it is discarded. Otherwise, it is added in order, creating empty entries if necessary. For example, if messages 3, 6 and 7 are added to the table in succession, then the first entry is for 3, then 2 empty entries for 4 and 5 are added, and finally entries 6 and 7 are added. If the low water mark was at 3 before the messages were added, subsequent calls to Remove would remove message 3 and return null for each subsequent call, until messages 4 and 5 are added. The retransmission thread continually iterates through all entries and checks the age of each entry. If it exceeds a certain user-defined value, a retransmission request is sent using the retransmission object registered with the sliding window table in the constructor.

This implementation of SlidingWindow does not maintain a window size; it can rather store as many messages as are received. This is clearly a disadvantage and needs to be changed in a future release. However, using a flow control protocol layer would also correct the problem.


next up previous contents
Next: ReusableThread Up: Utility classes Previous: View

1999-08-19