General Instructions. Students are required to work together in teams. You may work in the team you used for Phase III or you may form a new team. An assignment submitted on behalf of a "team" having fewer than 2 or more than 5 students will receive a grade of F. All members of the team are responsible for understanding the entire assignment.
This assignment expects you to extend the code for some Phase I implementation. Feel free to use code or ideas from any team's solutions to any prior phase in designing those extensions. However, submissions that contain extraneous code (such as code needed for implementing Phases II and III but not needed in Phase IV) will be penalized.
No late assignments will be accepted.
Academic Integrity. Collaboration between groups is prohibited and will be treated as a violation of the University's academic integrity code.
The primary/backup approach is one way that the availability of an application can be enhanced. A server (along with its state) is replicated; client requests are sent to and processed by the primary server, with state updates forwarded to all backup servers. And if the primary server fails, then (i) one of the backup servers assumes the role of primary server and (ii) clients are notified about the identity of the new primary server.
In this phase of our CS514 project, you will program such a primary/backup system, replacing each branch server by a service with equivalent semantics but exhibiting higher availability through what has been termed passive replication. Thus, this phase will give you an opportunity to master and get hands-on experience with a primary/backup protocol.
Run experiments to ascertain information you need in order for your protocols to exploit the synchronous model: message-propagation delays, processor speed differences, and clock synchronization errors. Employ either TCP/IP or (the less expensive) UDP as appropriate for communication between components of your distributed banking system.
Simulating Benign Failures. Extend the branch GUI from Phase I with a "button" that instigates the failure of all software running on the same host as this branch GUI, except for the branch GUI itself. Each component should, upon failure: halt, wait a random period (ranging from 15 seconds to 2 minutes), and then recover. An executing program infers the failure of a component either from direct interactions with that component or by consulting some form of failure detector service (which you must program and must itself be available, so carefully consider where its components are executed). In either case, timeouts and/or unexpected closing of a TCP/IP connection will likely be the basis for suspecting a failure. Upon recovery, the branch server should execute some "recovery code" that you provided.
Primary/Backup Protocols. There are many primary/backup protocols, and you are free to choose among them. Schemes that involve a single backup are somewhat simpler to build than those that involve two or more backups, because the former can employ simpler failure-detection and fail-over schemes whereas the latter often require some form of agreement protocol.
Choose one of the primary/backup protocols described or cited in Chapter 8 of Mullender or some other course text (and employ either one or two backups) or adapt chain replication (with a total of 3 chain elements) for the banking application.
The branch server (as opposed to the entire distributed banking system) is the software component to which the primary/backup protocol should be applied. For each branch server, support a highly-available branch service by deploying and running additional replicas (as backup servers) for that branch server. The primary server and the backup servers should each be deployed on distinct existing processors of the distributed banking system. Choose processors that are able to communicate with each other directly, stipulating constraints on the topology of the network as necessary.
Recovery. Design and implement a recovery protocol so that a failed branch server replica, upon recovery, returns to service as a backup server in the same highly-available branch service as it previously participated.
Submission Procedure. All submissions should be made through CMS. CMS provides a way for you to define your group. Be advised that each group member must take an action in creating a group, and your group cannot submit anything through CMS until the group has been created.
Submit the following files (at least):
Grading. Your grade will be based on the above documentation and the following elements: