![]() |
CS/CIS530 S04: Architecture of Large-Scale Information Systems |
|
Prelim 1This is a take-home prelim due by midnight Friday, 19 March 2004. Please submit a PDF to CMS. The exam is open book, open notes. You may discuss problems if you wish. Your solution should reflect your own work; solutions on different exams should not appear too similar. If you discuss a problem with someone, please reference that person on your solution. Problem 1: 2PC (20 pts)
Consider a distributed TP system comprising a commit coordinator
and
(a) What is the expected latency of a successful 2PC involving all
(b) Why doesn't this imply an upper bound of the throughput of a distributed TP system?
Assume a simple symmetric system in which
the values of
(c) What is the expected number of transactions executing 2PC at any given time,
as a function of the system Transactions Per Second (TPS) and
the
(d) Suppose coordinator failures are uniformly distributed over time.
What is the expected number of in-doubt transactions that will result from
a coordinator failure?
Again, express the answer as a function of TPS and
the (e) What is the (bad) effect of locating the participants far away from the coordinator?
Problem 2: SAGAs (15 pts)We have discussed business transactions consisting of multiple ACID transactions, each with a compensating transaction that can be applied (in reverse order) to roll back the business transaction if it fails. Such transactions are called Sagas in the research community. We remarked that sometimes the obvious compensating action may not be possible. For example, if an original action adds $10 to my checking account, the compensating action of subtracting $10 may be infeasible, because business transactions run without isolation, allowing me to make a $7 withdrawal concurrently with the business transaction’s forward execution. Consider a business transaction that does multiple transfers; for example subtract $10 from account A There are normal reasons to abort (such as finding no widgets left in the warehouse) and exceptional ones (such as an application crashing). Ideally, we would like to implement the business transaction so all the above actions happen (in some order) as separate ACID transactions, and we are able to roll back in response to any (normal or exceptional) condition. Is this possible? If so, show how. Otherwise, do as well as you can, describe the situations under which you cannot roll back cleanly, and explain the problem. Problem 3: Partitioning vs. Replication (25 pts)
Suppose your database server has a capacity of at most
of read and write accesses.
Despite the current state of the economy,
your business is growing,
and the offered read and write load on your database
(
Replication: Replicate all your data on all
Partitioning: Assign each row of data to one server
according to some hash function applied to its primary key:
if
(a) What is the capacity of an Note the simple convex combination description used above to describe a single server doesn’t work for the partitioning scheme without some assumptions about the distribution of write keys, and it doesn’t work at all for the replication scheme. The answer is a bit more subtle than that. Discuss the scalability of the two schemes.
(b)
Suppose the offered read load includes some reads using a secondary key.
An individual database server is indexed on both keys,
and its capacity is described by a triple
(c) In the real world, failures happen. Obviously, the partitioning scheme described above cannot cope with a database failure. Describe a scheme that combines replication and partitioning to remedy this, while retaining some of the advantages of the pure partitioning scheme. Problem 4: Recovery Using Device State and Idempotency (20 pts)In Lecture 6 we discussed (forward) recovery for multi-transaction RPC when some actions might not be recoverable. There are two basic approaches: idempotency, and readable device state that reliably reflects the operations the device has performed. We gave an example with a simple device that performed exactly one operation per transaction. Now consider a more complicated example: a (high-end) ATM that (1) gives out some paper money; Can you extend our recovery example so it works properly with a device like this that makes multiple state transitions as part of the commit? If so, describe how to do it, including the assumptions you make about the behavior of the device state. If not, explain why. Problem 5: Session State (20 pts)Most eCommerce sites use cookies to help identify sessions, (though storing the entire session state in a cookie is rare). In lecture we discussed a fault-tolerance scheme using reliable multicast (e.g. the Spread toolkit) to maintain synchronized replicated copies of session state in all the application servers. With the same session state available to every application server, it might seem that the load balancer could ignore session information in connection requests, e.g. routing each new connection request to the currently most-lightly-loaded server. This might allow the load balancer to avoid switching by layers 4-7, thus avoiding the expensive delayed binding required in order to examing higher-level protocol data. Unfortunately, with many browser clients it is possible to send two or more http requests (in the same cookie environment, therefore the same session) without waiting for a response from the server.
What problems could this create?
Suggest one or more possible efficient solutions.
|