Creating Reliable Distributed Systems in Java

Li, Zhongwei	460145	zl31@cornell.edu
Ma, Xiangjiang	458080	xm16@cornell.edu
Wang, Yiwen	503763	yw49@cornell.edu

The work is based on NinjaRMI and an extension of NinjaRMI from point-to-point to point-to-multipoint communication is made. Instead of sending a request to a single server, a client would transparently send a request to a server group. This adds failure tolerance and high-availability to NinjaRMI services. The replicated server group is integrated with JavaGroups to maintain state consistency. Highly fault tolerant behavior is demonstrated by running shared whiteboard program and performance is found almost to be the same.

Distributed object applications written in Java are typically implemented through Java Remote Method Invocation (RMI). RMI applications are often comprised of two separate programs: a server and a client. The server creates some remote objects, makes references to them accessible, and waits for clients to invoke methods on these remote objects. The client gets a remote reference to one or more remote objects in the server and then invokes methods on them. RMI provides the mechanism by which the server and the client communicate and pass information back and forth. There are two implementation of RMI. One is the RMI with JDK1.2 from Sun [4], another is the NinjaRMI from UC Berkeley [3].

RMI eases the development of distributed application, however, it does not directly improve the reliability of the applications. RMI applications are not reliable because a single failure of the registry or the server will lead to a failure of the whole application.

Building reliable distributed object applications is an interesting research area. But few researches have been done in developing fault tolerance techniques in RMI. There are currently two research groups are working on this area. One is from the Computer Science Department in New York University and Lucent Technologies Bell Laboratories [1]. Another is from the Computer Science Department in University of California at Berkeley [3].

NYU-Lucent group built replicated fault-tolerant Java server objects through a Java package called Filterfresh. They created a logical group by a GroupManager Object instantiated with each replica in order to maintain the correctness and integrity of replicated servers. A consistent group view and a reliable multicast mechanism were maintained by a Group Membership algorithm. In the implementation of Filterfresh, two step were taken. First, GroupManager class was used to construct a fault tolerant RMI registry. Second, a transparent client-side mechanism that enables application server failures to be tolerated at the stub layer was built. It is worth noting they have successfully implemented a fault tolerant RMI registry mechanism.

Berkeley group built a carefully controlled environment out of a collection of execution environments (Bases) and used dynamically generated code to introduce run-time-generated levels of indirection separating clients from services. They tried to address all of the difficult service fault-tolerance, availability, and consistency problems in the Base. A Java-based prototype Base implementation called MultiSpace was built and two novel services were implemented. The MultiSpace implementation consists of three layers: the MultiSpace layer on the top is a set of multiple node abstractions; the iSpace layer in the middle is a single-nod execution environment; and the NinjaRMI layer on the bottom is a set of communication primitives. The replicated registry is maintained through cooperation of all MultiSpaceLoaders in a cluster. Each node periodically sends out a multicast beacon that carries its list of local services to all other nodes in the MultiSpace, and also listens for multicast messages from other MultiSpace nodes. In such a way, each MultiSpaceLoader builds an independent version of the registry from these beacons. In the paper, the authors demonstrated that the applications built from their implementation of MultiSpace were fault-tolerant and scalable.

Both above approaches can improve reliability of the RMI service. However, both have to build complicated Java packages and also are not completely compatible with normal Java RMI. In order to support the existing normal RMI applications, considerable changes of the Java source code have to be made in their approach.

In this project, a relatively simple approach is taken to add transparent fault tolerance through replicant server groups. The registry table in the NinjaRMI registry server package is extended to store a group of duplicated remote objects. Registry server is responsible to detect the healthy server.

The design principle is to reuse the existing normal RMI applications as much as possible and add server fault-tolerance functionality in a transparent fashion to those applications. Available Java RMI package is extended to build highly available and reliable client-server applications. Instead of making a remote method call to a peer remote object (RMI server), a client would transparently make a call to a server group which contains a set of replicas of RMI servers. The registry sever returns the first response from those remote objects. In this way, when one or more servers are temporarily down, the client can still acquire available response from the other healthy server even without knowing where the failure points are. Therefore, reliable Java client-server applications can be built.

In order to implement above methodology, the way to pass the remote objects has to be designed and specified. Generally, remote objects are passed by reference. A remote object reference is a stub, which is a client-side proxy that implements the complete set of remote interfaces that the remote object implements. The stub code runs on the client and converts client requests for service functionality into network messages, potentially marshalling request parameters and unmarshalling the results. Normal Java stub is generated by the RMI compiler. But this approach can become a performance bottleneck and a single point of failure. The solution is to build a special stub to implement the architecture of multiple servers.

The base service of this special stub employs a similar technique to Java RMI except that the stub code contains embedded logic to select from amongst a set of servers. Failure can be corrected by reissuing failed or time-out service calls to an alternative replica server. Client application can be coded without knowing the redirected logic, and can simply assume that invocation to the service through the the stub is complete, or otherwise return an error if no back-end nodes can complete the request. This special stub is created by modifying the NinjaRMI compiler with a new option in order to maintain compatibility.

Ninja RMI package is chosen as the basis to implement fault-tolerance functionality. The whole package ninja.tar is downloaded from http://ninja.cs.berkeley.edu/. Since all source codes are included in the package, additional features, such as fault-tolerance functionality, can be added into this package. Available examples in the example folder can also be used to test the results.

The key idea in implementing the highly available and reliable RMI server is to create new registry hash table in registration server. In a normal registry hash table, one service name links to one remote RMI object (one to one), as shown in Table 3.1. In the new register hash table, one service name is related to a set of duplicated remote objects stored in a vector (one to many), as shown in Table 3.2 . Duplicated servers can be run in different machines. A runtime socket is created by the registry server to detect if the server is available or not.

New stub generated from the modified RMI compiler can get registry service after the RMI servers are running. When a server is suddenly down during a normal communication, a normal stub will throw an exception to the client application. The new stub, however, will catch the exception and contact the registry server. The registry server randomly hands out a remote object reference to a healthy remote object in the registry hash table. In such a way, client does not need to worry about what happens on the server side. Transparency is thus realized.

In order to test the modified NinjaRMI system we developed, we implemented a whiteboard application and integrated it with JavaGroups, which is developed in the computer science department in Cornell University [2]. Every server in the server group shares the same channel name in the JavaGroups. When a new server starts, it first checks the state by calling GetState() method. It will work as the first registered server if it finds no other server in the group, otherwise, it will work as a duplicated server in the server group. There are two situations when an active server changes its state. One is related to a immediate action such as clear() method is called in the whiteboard application. All severs in the same group will update their states immediately if such message is sent. The other is a in more general way. Every server in the same group will periodically check its state and update the state of all server if it finds its state is changed. By calling sendNewState() method, every state change in the server will be multicast to all other servers in the same Channel. In addition to this "periodical" updating method, we also implemented an "immediated" updating method to maintain the state consistency. In both way, every server in the group will share the same state. Even if one or more servers are temporarily down during normal communication process, the client can still get correct information from the next available server. State consistency of the server can thus be guaranteed.

Reliability of Java distributed applications is tested by by running the available examples from the modified NinjaRMI package. One of the test programs simply pings the server 10,000 times and then measures the average round-trip time. Following three steps are taken in running a typical NinjaRMI application

After registry sever is running, two or more server processes can get registered in the registry server. Test results show that after one or more servers are turned down, the client still can get the remote service from the first available remote object reference in the vector. All existing testing programs in the NinjaRMI package run well. This step demonstrates that modification of NinjaRMI registry server works.

Test results show that when the RMI server is turned down during normal communication process between client and server, the client can still get the remote service from the first available remote object bound in the vector. This process is done in a transparent fashion since the new stub is responsible for locating the available healthy server. In particular, we also developed a reliable whiteboard application based on a normal RMI application written by Prof. William Scherlis at CMU. All testing programs run successfully. This step demonstrates that modification of NinjaRMI stub works.

Modified NinjaRMI's performance is almost the same as that of the original NinjaRMI. On a Pentium-200Hz running NT4.0 with JDK 1.2, an average round-trip time is about 4.64ms over a local TCP socket for both the original and modified NinjaRMI package. This is understandable since the only "overhead" from the modified ninjaRMI is an additional detection step from the registry server. This step may take negligible time.

Building a simple server load balancing system is an interesting extension to this work. In the current implementation, the registry server randomly hands out a remote object reference to a healthy sever in the server group. Modification can be made to rotate the vector of replicated servers every time a remote object reference is handed out to a server. Another extension to this work is to creating reliable register server . This can be solved by using similar ideas from NYU-Lucent group, who has successfully implemented the fault-tolerant RMI registry mechanism. The third extension is durability. One server can act as a state-retriever which periodically retrieves the state from the primary and propagates it to all other backup servers. The primary logs all state changes. When it retrieves a GetState() it returns its current state and deletes the log. When it crashes before receiving the next GetState() request, its state can be reconstructed using the log.

We would like to express my appreciation to Professor Ken Birman for his supervisory on this project. We are mostly indebted to Dr. Bela Ban for his guidance of this project and his help in JavaGroups. We would also want to thank Mr. Matt Welsh for his help in configuring NinjaRMI and Prof. William Scherlis for his RMI whiteboard application.

Name	Remote Object
RemoteObjName-a	Obj-a
RemoteObjName-a	Obj-b

Name	Remote Object
RemoteObjName-a	Vector(Obj-a1, Obja2, Obj-a3, ...)
RemoteObjName-b	Vector(Obj-b1, Objb2, Obj-b3, ...)

Extension of RMI with Group Communication