CS/CIS530 S04: Architecture of Large-Scale Information Systems

 
 

Prelim 1

This is a take-home prelim due by midnight Friday, 19 March 2004. Please submit a PDF to CMS.

The exam is open book, open notes. You may discuss problems if you wish. Your solution should reflect your own work; solutions on different exams should not appear “too similar.” If you discuss a problem with someone, please reference that person on your solution.

Problem 1: 2PC (20 pts)

Consider a distributed TP system comprising a commit coordinator and n resource managers (participants). Let

  • RTT[i] be the expected rount-trip time for a message exchange between the coordinator and participant i;
  • WT[i] be the expected time to write a log record at participant i;
  • WT[0] be the expected time to write a log record at the coordinator;
  • PT[i] be the expected time for participant i to attempt to prepare and determine its vote;
  • CT[i] be the expected time for participant i to commit from the prepared state.

(a) What is the expected latency of a successful 2PC involving all n participants?

(b) Why doesn't this imply an upper bound of the throughput of a distributed TP system?

Assume a simple symmetric system in which the values of WT[i], RTT[i], PT[i], and CT[i] are all independent of i (that is, the participants are indistinguishable), and every transaction involves all n participants.

(c) What is the expected number of transactions executing 2PC at any given time, as a function of the system Transactions Per Second (TPS) and the RTT, WT, PT, and CT values.

(d) Suppose coordinator failures are uniformly distributed over time. What is the expected number of in-doubt transactions that will result from a coordinator failure? Again, express the answer as a function of TPS and the RTT, WT, PT, and CT values.

(e) What is the (bad) effect of locating the participants far away from the coordinator?

Problem 2: SAGAs (15 pts)

We have discussed business transactions consisting of multiple ACID transactions, each with a “compensating transaction” that can be applied (in reverse order) to roll back the business transaction if it fails. Such transactions are called Sagas in the research community.

We remarked that sometimes the “obvious” compensating action may not be possible. For example, if an original action adds $10 to my checking account, the compensating action of subtracting $10 may be infeasible, because business transactions run without isolation, allowing me to make a $7 withdrawal concurrently with the business transaction’s forward execution.

Consider a business transaction that does multiple transfers; for example

subtract $10 from account A
add $10 to account B
subtract $5 from account C
add $5 to accound D
subtract 1 widget from warehouse
add 1 widget to truck

There are normal reasons to abort (such as finding no widgets left in the warehouse) and exceptional ones (such as an application crashing). Ideally, we would like to implement the business transaction so all the above actions happen (in some order) as separate ACID transactions, and we are able to roll back in response to any (normal or exceptional) condition.

Is this possible? If so, show how. Otherwise, do as well as you can, describe the situations under which you cannot roll back cleanly, and explain the problem.

Problem 3: Partitioning vs. Replication (25 pts)

Suppose your database server has a capacity of at most cr read accesses per second, or a smaller number cw of write accesses per second, or any convex combination

a * cw  +  (1-a) * cr

of read and write accesses.

Despite the current state of the economy, your business is growing, and the offered read and write load on your database (lr and lw, respectively) has reached its capacity. So you consider the following schemes for expanding to two (or in general to n) servers:

Replication: Replicate all your data on all n servers. Implement each write using 2PC to write every copy of the datum in the same distributed ACID transaction. (Your friendly sales rep assures you that the 2PC is no more expensive than the sum of the write transactions at each of the n servers.) Implement each read by choosing a server at random or by some more sophisticated load balancing scheme.

Partitioning: Assign each row of data to one server according to some hash function applied to its primary key: if hash(k) = i then the row with key value k is stored on database i.

(a) What is the capacity of an n-server system using each of these schemes, assuming all reads and writes are indexed by primary key? There is plenty of main memory available, so you may make the (only slightly unrealistic) assumption that the PK index is entirely cached in main memory at each server, and the individual server capacities are independent of the amount of data stored at them.

Note the simple “convex combination” description used above to describe a single server doesn’t work for the partitioning scheme without some assumptions about the distribution of write keys, and it doesn’t work at all for the replication scheme. The answer is a bit more subtle than that.

Discuss the scalability of the two schemes.

(b) Suppose the offered read load includes some reads using a secondary key. An individual database server is indexed on both keys, and its capacity is described by a triple cw, crp and crs, reflecting writes, reads by primary key, and reads by secondary key. How does the answer to part (a) change?

(c) In the real world, failures happen. Obviously, the partitioning scheme described above cannot cope with a database failure.

Describe a scheme that combines replication and partitioning to remedy this, while retaining some of the advantages of the pure partitioning scheme.

Problem 4: Recovery Using Device State and Idempotency (20 pts)

In Lecture 6 we discussed (forward) recovery for multi-transaction RPC when some actions might not be recoverable. There are two basic approaches: idempotency, and readable device state that reliably reflects the operations the device has performed.

We gave an example with a simple device that performed exactly one operation per transaction. Now consider a more complicated example: a (high-end) ATM that

(1) gives out some paper money;
(2) gives out some change; and
(3) prints a receipt.

Can you extend our recovery example so it works properly with a device like this that makes multiple state transitions as part of the commit?

If so, describe how to do it, including the assumptions you make about the behavior of the device state.

If not, explain why.

Problem 5: Session State (20 pts)

Most eCommerce sites use cookies to help identify sessions, (though storing the entire session state in a cookie is rare). In lecture we discussed a fault-tolerance scheme using reliable multicast (e.g. the Spread toolkit) to maintain synchronized replicated copies of session state in all the application servers.

With the same session state available to every application server, it might seem that the load balancer could ignore session information in connection requests, e.g. routing each new connection request to the currently most-lightly-loaded server. This might allow the load balancer to avoid switching by layers 4-7, thus avoiding the expensive “delayed binding” required in order to examing higher-level protocol data.

Unfortunately, with many browser clients it is possible to send two or more http requests (in the same cookie environment, therefore the same session) without waiting for a response from the server.

What problems could this create? Suggest one or more possible efficient solutions.
 

 

HOME | ANNOUNCE | ADMIN | SCHED | LECT | HW | PROJ | MAIL