Phase IV: Availability Through Active Replication

Due: 11:59pm Monday, 11/23/2009

General Instructions. Students are required to work together in teams. You may work in a team you used for an earlier phase or you may form a new team. An assignment submitted on behalf of a "team" having fewer than 2 or more than 5 students will receive a grade of F. All members of the team are responsible for understanding the entire assignment.

This assignment expects you to extend the code for some Phase I implementation. Feel free to use code or ideas from any team's solution to any prior phases in designing those extensions. However, submissions that contain extraneous code (such as code needed for implementing earlier phases but not needed in this phase) will be penalized.

No late assignments will be accepted.

Academic Integrity. Collaboration between groups is prohibited and will be treated as a violation of the University's academic integrity code.

Background: Enhanced Availability and Fault-tolerance

In the distributed banking system you built for Phase III, bank accounts (and the funds they store) are unavailable from the instant that the primary server fails until clients of the service have been informed of the new primary's identity. Moreover, the primary/backup approach cannot tolerate Byzantine failures. Active replication (also known as the "state machine approach"), though a bit more expensive, does not exhibit these limitations.

In this phase of our CS514 project, you will employ active-replication, replacing each branch server by a service with equivalent semantics but exhibiting higher availability. This phase will thus give you an opportunity to get hands-on experience designing a service that implements active replication.

What to Build

For this phase, you are given considerable latitude concerning assumptions you make about the environment and protocols you employ in implementing your system. At a minimum, however, assume a computing environment in which: As in previous phases, feel free to employ a single real computer in order to simulate all of the hosts and the network being used by your system.

In addition to the above assumptions, define the other aspects of your computing environment by making choices for the following computing environment characteristics.

Comments about State Machine Protocols. A branch server can be replicated by running a replica of it on hosts at other branches. There are two ways to support a replica of branch server for branch B1 (say) on the host for branch B2 (say):

  1. Modify the branch server at branch B2. The modified branch server not only processes the operations (commands) that it used to (e.g. Deposit, Withdraw, Query, Transfer for that branch's accounts) but this modified branch server also handles these operations for branch B1. Effectively, we are merging new branch server replicas with the branch server that already exists at each host.

  2. Instantiate at branch B2 a copy of the branch server for branch B1, and associate this copy with a new socket at the host running branch server B2. Note that you may have to modify your "network wrapper class" to accommodate any additional sockets now in use.

As far as the state machine approach is concerned, each branch GUI should be viewed as a client. This means that the branch GUI will now be receiving multiple responses for each operation it initiates and, therefore, it must convert those responses into a single one (which is then presented to the human user). Because the mechanism for combining responses resides in the branch GUI, it never experiences failures.

Recall that in processing a Transfer operation, one branch server invokes an operation at another. Think carefully about how best to handle this once branch servers are replicated. Although there is only one level of nested call by servers in the system, try to devise a solution that scales up, gracefully supporting multiple levels of nested calls. The naive solution---having each server replica make a separate request to each other server replica---is workable, but more-scalable solutions will receive higher grades.

Submission Procedure. All submissions should be made through CMS. CMS provides a way for you to define your group. Be advised that each group member must take an action in creating a group, and your group cannot submit anything through CMS until the group has been created.

Submit the following files:

TEAM a .txt file that contains the names (and net-ids) for all team members. Also, for each team member give a 1 or 2 paragraph description of the tasks this team member performed and the number of hours this required.

README a .txt file that contains

LOGIC a .txt file that contains

TestPlan a .txt file that describes the process and any tools (i.e. additional programs) you wrote in order to test your system. This file should also explain what tests you ran and why this was a reasonable set of tests to have run.

SourceCode A zip file containing the sources needed to compile and test your system.

Grading. Your grade will be based on the above documentation and the following elements: