Phase II: Bank-Balance Computation
Due: by the start of class, Tuesday March 8, 2005.
General Instructions.
Students are strongly encouraged to work together in teams.
You may work in the team you used for Phase I or you may form a new team.
.
All members of the team are responsible for understanding the entire
assignment.
This assignment builds on work done in Phase I.
If your solution to Phase I was unstable, you have our permission to use some
other team's solution to Phase I as the basis for your
solution to Phase II. Vivek can suggest a team to approach if you don't
know of any group with a rock-solid Phase I solution.
No late assignments will be accepted.
Academic Integrity. Collaboration between groups is
prohibited (other than using someone else's Phase I solution), and will be treated as a violation of the University's
academic integrity code. If you did use someone else's Phase I solution,
indicate this when you hand in your Phase II results.
Background: Computing a Bank Balance
Once upon a time, banks opened at 9am and closed at 3pm.
This left plenty of time each day for that day's transactions to be processed,
and doing that processing was easy because
the bank's accounts were not being changed by concurrent customer activity.
A 21st Century bank will likely be open to its customers 24 hours a day.
Networked computers should allow transactions to reach the bank's branches from anywhere
in the world that is "wired".
But, suitable protocols must be employed, so that customer transactions can run
concurrently with the bank's back-office daily processing.
In this phase of the cs514 project, you will program a distributed snapshot algorithm that
computes the assets of the distributed bank implemented for phase I, and does so
while the bank is processing transactions.
Doing the exercise should:
-
Help you to master ideas we have been discussing in class: consistent cuts and
snapshot algorithms.
-
Give you an opportunity take the description of
an algorithm that might be found in a scientific journal
and translate it to an actual implementation.
-
Force you to contemplate issues associated with testing distributed
protocols.
What to Build
Extend the branch GUI from Phase I
with a "button" that allows a user to request an all-bank snapshot.
Activating this button causes a message to be sent to the
local branch server.
That message should lead to taking a global
snapshot of (i) all non-zero account balances at all branches
and (ii) any in-process Transfer operations.
This global snapshot should be displayed in pieces,
with a separate piece being displayed by the (local) branch GUI for
each of the branches.
Specifically, at each branch, that branch's GUI should display:
-
The list all the accounts having non-zero balance and
maintained by that branch, along with the balance in each.
-
For each communications link (as specified in the network topology file) into
that branch, list all
the funds transfers that were in-process when the all-bank snapshot was requested.
Note that, for this application,
snapshot information is not being sent from branch to branch and is not
being assembled at a single process.
In class, we discussed a simplified version of the Chandy-Lamport snapshot
algorithm during lecture 6. The algorithm lets us record the state of each
process in a system, and the contents of messages in the channels between
processes, such that the state we collect is one that "could" have arisen at
some instant in real-time. Since the book mentions this algorithm but
doesn't cover it in detail, here's a reference to the actual paper:
-
K.M. Chandy and L. Lamport.
Distributed snapshots: Determining global states of distributed systems.
ACM Transactions on Computer Systems
-
Vol. 3, No. 1, pp 63-75.
First, we want you to think about a basic question: is the Chandy-Lamport
notion of global state needed for our banking snapshot algorithm? We
won't insist that you use this solution if you consider it too complex for the
purpose, or too complex to implement, or feel that it solves the wrong problem.
But we do insist that your solution be a correct one.
Remember that UDP communication is unreliable and that our communications graph
doesn't support bi-directional communication. Thus while we can be
"inspired" by the algorithm, we can't implement it literally the way that the
paper does; their version assumed TCP-style connections that are lossless, FIFO
and bi-direction. In our setting, if branch A can send messages directly
to branch B, it doesn't follow that B can send messages directly back to A.
In fact, to implement this snapshot
protocol you will need to design an "end to end" solution running over UDP
and gives you properties similar to the ones that Chandy-Lamport assumes!
Hint: consider implementing a new "inter-branch communications layer" that (1)
allows all branches to communicate with all others, (2) delivers messages
reliably and in FIFO order, and (3) runs over our constrained UDP layer.
Then run all branch-to-branch communication, including the snapshot
algorithm, over this new layer.
You may assume that no branches will crash while running the snapshot procedure.
However, you should design your solution so that it CAN tolerate UDP packet loss
or out-of-order delivery by UDP. You should invent a testing procedure
that will let you confirm that your solution works properly if this kind of
problem arises.
Submission Procedure.
Create a directory containing the files you wish us to grade. Use Winzip to
create an archive of this file and email the result to
Vivek.
Should you wish to revise your submission after you have copied it to our folder,
then simply correct the files and resubmit the entire directory.
We will grade only the last submission of a series.
No late submissions will be accepted.
Your directory should contain the following files (at least):
-
TEAM which contains the names (and net-ids) for all team members.
Also, for each team member give a 1 or 2 paragraph description of the tasks
this team member performed and the number of hours this required.
-
README which contains
- The names and a description of the contents for the other files in the directory.
- Instructions for installing, compiling, and running your software on our
Windows-NT system.
- A tutorial that the grader can follow to start your software and to convince
himself that your system implements the required functionality. Expect the grader
to spend at most 10 minutes on this task.
- The list of authors on the team that developed the Phase I software your
Phase II solution builds on.
-
TOPO should specify an interesting interconnection topology for a multi-branch bank
that will be used to illustrate the operation of your system.
-
TOPOexp should specify what requirements the topology must satisfy in order
for your snapshot algorithm to work correctly.
For example, do you require that every
link be bi-directional?
Obviously, imposing fewer restrictions is better.
-
TestPlan should describe the process and any tools (i.e. additional programs)
you wrote in order to test your system.
-
C# source file that contain the C# source needed to compile and test your system.
Grading.
Your grade will be based on the following elements:
-
Does your system correctly implement the snapshot algorithm and compute
consistent cuts for system operation if:
-
(Satisfactory)
At most one snapshot from at most one branch GUI is in progress at any time.
While that snapshot operation runs, new Deposit, Withdraw,
Query, or Transfer operations are delayed
and not processed by branch servers.
-
(Good)
Multiple concurrent snapshots from different branch GUI's can be in progress at any time.
The snapshots each proceed independently.
While the snapshot operation runs, new Deposit, Withdraw,
Query, and Transfer operations are not delayed by branch servers.
-
(Excellent)
Multiple concurrent snapshots from different branch GUI's can be in progress at any time.
The snapshots do not necessarily proceed independently, sharing work when this is
possible.
While the snapshot operation runs, new Deposit, Withdraw,
Query, and Transfer operations are processed by branch servers.
-
How easy is it to follow the README file installation and sample-execution script.
-
How thorough was your testing procedure and how creative you were in building
sufficient scaffolding (test drivers etc) to test your system.
-
Is the source code easy to understand and does it exhibit good structure?