LWG (light-weight groups)



next up previous
Next: FAST (message acceleration) Up: Layers and Protocols Previous: CLTSVR (client-server membership)

LWG (light-weight groups)

Another important scalability issue is scalability in the number of groups. When Isis was designed, it was envisioned that some tens, maybe hundreds of groups would suffice for all fault-tolerance and parallel execution needs. When Isis was distributed among users, it soon became clear that groups were used quite differently. Rather than using a group per fault-tolerant service, programmers created a group per fault-tolerant object. That is, they used the group paradigm for implementing individual objects. This way they did not have to deal with multiplexing requests for different objects over a single group, and could use the group to address the complete set of replicas and cached copies. This led to a serious problem of scale: if a server crashed, all objects that had a member on that server would start their own flush protocol, leading to a storm of redundant messages on the network.

To address this, Horus includes a protocol layer that multiplexes groups over a small set of core groups (much like light-weight threads in a small set of heavy weight processes). The approach yields a significant amortization of costs of failure recovery, and also improves other aspects of group communication [5]. For example, the cost of joining a new member to a light-weight group is very low, allowing for short-lived membership. Also, the cost of ordering protocols (such as causal or total ordering) underneath the light-weight groups is cheaper than if each group runs its own protocol.



Robbert VanRenesse
Tue Nov 15 12:09:10 EST 1994