C Bimodal Multicast (by Ken Birman, Mark Hayden, and Zhen
Xiao)
There are many methods for making a multicast protocol
reliable. The majority protocols in Ensemble aim to provide
virtually synchronous properties. However, these properties come with
a price in terms of the possibility of unstable or unpredictable
performance under stress and limited scalability. This is
unacceptable to some applications where system stability and
scalability are viewed as inextricable from other aspects of
reliability.
This section describes a bimodal multicast protocol that not only has
much better scalability properties but also provides predictable
reliability even under highly perturbed conditions. This work is
described in the Bimodal Multicast paper
(ncstrl.cornell/TR98-1683) by Ken Birman, Mark Hayden, Oznur Ozkasap,
Zhen Xiao, Mihai Budiu and Yaron Minsky (this documentation is based
on that paper). The original version of the protocol was implemented
by Mark Hayden. It was reimplemented by Zhen Xiao with many new
optimizations and is described in the ZBCAST layer. In the
remainder of the section, we will refer to our new protocol as Zbcast.
C.1 Protocol description
Zbcast protocol consists of two stages:
-
During the first stage, the protocol multicasts each message
using an unreliable multicast primitive. IP multicast can be used to
serve this purpose if it is available. Otherwise a randomized
tree-dissemination protocol is used (the GCAST layer). In the latter
case, the protocol uses the Ensemble group membership manager to
track membership information.
- The second stage uses an anti-entropy protocol to detect and
correct message losses in the group. The protocol consists of a
series of rounds. In each round, every member randomly choose
another member and exchange its message histories with him. A member
compares the received message history with its own. If it detects
itself to be lacking a message during the exchange, then it solicits
copies of that message from the original sender.
A member records the round in which a message is received. The
message will be gossiped until it has not been requested for
retransmission for a prespecified number of rounds. At that point
the member will garbage collect that message. Due to the
probabilistic nature of the protocol, it is possible for a message to
be garbage collected by other members while some operational member
has not received it yet. In such cases, the member missing the
message will report a message gap to the application.
C.2 Usage
To use Zbcast protocol, specify the ``Zbcast'' property on the command
line as follows(using perf demo as an example):
perf -prog 1-n -add_prop Zbcast -groupd
This assumes that IP-multicast is available in the underlying
network. Remember to set the related environment variables:
ENS_DEERING_PORT=38350
ENS_MODES=Deering:UDP
Otherwise the Gcast layer needs be linked into the stack:
perf -prog 1-n -add_prop Zbcast -add_prop Gcast -groupd
Note that in both cases we need groupd to track group
membership information. This is the state of art of the current
implementation and is not something intrinsic to the protocol. If
sufficient needs arise, we are going to remove this restriction.
Message losses are reported to the application via Up(ELostMessage)
event. The application can either ignore those messages
(i.e. multimedia applications) or leave the process groups and then
rejoin them, triggering state transfer.