CS/CIS530 S04: Architecture of Large-Scale Information Systems

 
 

Take-Home Final Exam

This exam will comprise 4 questions. Here are the first two (7:00 PM 14 May). The remaining ones will appear tomorrow AM.

This is a take-home final exam due by midnight Thursday, 20 May 2004 (that is, the midnight between Thursday and Friday). Please submit a PDF file to CMS.

Rules are the same as for the prelim: The exam is open book, open notes. You may discuss problems if you wish. Your solution should reflect your own work; solutions on different exams should not appear “too similar.” If you discuss a problem with someone, please reference that person on your solution.

Problem 1: Asynchronous Business Transactions

Consider the simple order processing protocol from Lectures 21-22. It involves three sites: the main site (where users connect), a shipping site and a billing site. We discussed this under the assumption that each site supported local transactions, but distributed transactions across the sites were not supported.

We are going to make three changes.

First, we introduce transactional messaging. We still do not support distributed transactions among the sites, but we introduce a shared message queue database. At any site, a process may dequeue an input message, perform some processing to update its local database, enqueue some output (reply) messages, then commit all these actions as a single ACID transaction. You can assume that once a message has been enqueued (and the transaction commits) the message will be reliably delivered to its intended recipient. In effect, the middleware at a particular site can act as a 2PC coordinator for transactions involving that site’s local database and the message queue database, but no other site’s local database.

Our second change is to allow a customer to reconnect to the main site and alter or cancel an outstanding order. Note that this change introduces the possibility of crossed messages as discussed briefly in Lecture 22.

The final change is to add an “undeliverable” notification from the shipping site, e.g. as the result of shipping to a nonexistent address. This requires compensation: at very least a refund to the customer’s credit card.

Under these new assumptions, give a complete design, including the data stored at each site, the (synchronous and asynchronous) operations at each site, for all scenarios. In particular, describe the compensation required when an order is undeliverable. Discuss changing or cancelling an order, including how to deal with crossed messages. Argue convincingly that there are no possible “race conditions” that could lead to an inconsistent state.

Problem 2: Database Connection Pooling

This question considers management of connections between the middle tier and the database: the connection pool manager. You should be familiar with connection pooling. A connnection pool manager hands out connections to application code at an operation-by-operation granularity: each database access performed by the application code follows a pattern like

Connection c =
  acquirePooledConnection(...)

doSQL(c, ...)
    or
  commit(c)
    or
  abort(c)
releasePooledConnection(c)

This facilitate re-use of connections, so the system can avoid expensive connection creation and teardown operations.

At any time an active connection to a database has the following two parameters:

  • dbuid, the database user id associated with the connection
  • txnid, the globally unique id of the current transaction

Database systems differ in their ability to set or change these parameters:

In a static system each new connection gets an authenticated dbuid which does not change for the life of the connection. Each transaction is associated with exactly one connection. Creating a connection implicitly starts a new transaction for it; committing or rolling back the transaction implicitly creates another transaction on the same connection; and closing a connection implicitly rolls back its current transaction.

In a transaction-dynamic system the dbuid associated with a connection cannot change, but a connection may switch dynamically among transactions (txnids) associated with its dbuid. (The database server internally enforces single-threaded semantics of individual transactions. This does not forbid multiple connections associated with the same txnid; conceptually it just requires that each operation at the database locks its associated transaction object to ensure mutual exclusion.)

In a fully dynamic system the txnid and even dbuid associated with an open connection can be changed dynamically, allowing a single connection to be shared among multiple transactions and database users. (Oracle has fully dynamic connections to support connection pooling where both the client and the server are Oracle database instances.)

Obviously there is some cost associated with changing the transaction or user associated with a connection – particularly the first time a connection switches to a new dbuid, which may require a complete authentication. Nevertheless, the cost can be significantly less than the execution cost of tearing down a connection and creating a new one, or the resource cost of maintaining multiple open connections.

(a) It is common for an application to be associated with only one (or perhaps just a few) dbuids. That is, all database operations performed by the application use a dbuid that is associated with the application, but is not specific to the customer on whose behalf the operation is being performed. This is the way we used Oracle for the CS530 projects. Clearly, since the dbuid is always the same, a fully-dynamic connection scheme has no advantage over a transaction-dynamic one in this situation.

Under these conditions, describe how the connection pool manager in the application server would be implemented for a database supporting static connections and for a database supporting transaction-dynamic connections. You may assume that the current txnid is returned as part of each database operation, so the connection pool manager always knows the current txnid.

(b) Under the conditions of part (a), suppose you are trying to choose between a database that provices only static connections, and a (presumably more expensive) database that provides transaction-dynamic connections. How would you make the choice, without implementing both solutions and measuring them?

First, just state informally how a transaction-dynamic system might achieve higher performance.

Next, try to be more formal. This is a pretty open-ended question. There are several parameters of interest, including the number of open connections in the pool, the frequency with which connections are created and destroyed, and (in the dynamic case) the frequency with which connections switch among transactions. These are (roughly) the space and time costs of connection pool management, and they can trade off against one another. Note there are costs at both the app server and the database server.

The idea is to identify the important parameters of your application workload (involving its use of pooled connections and transactions). These parameters should enable you to write down expressions for expected space and time cost of connection pool management, as a function of the number of simultaneous logged-on customers at your site.

(c) Some application designers suggest assigning a distinct dbuid to each registered customer of your site. Database queries performed on behalf of a given customer use a database connection with that customer’s associated dbuid. The argument in favor of this approach is that database operations are performed with the minimum possible privilege – each database user (hence each customer) is given access only to the data she is entitled to see.

What problems can you see with this scheme? Specifically, discuss (i) the cost of connection pool management if connections are not fully dynamic, and (ii) the process of registering a new customer at the site.

Problem 3: Performance of Optimistic and Pessimistic Offline Locking

This question has a lot of math formatting, so it is here: (PDF).

Problem 4: Phishing?

Okay, this exam is really long enough. So this last question is a sort of essay question.

About once a day I receive an email purporting to be from some reputable e-commerce site like Amazon (choose your own favorite), saying something like “we may have had a security compromise; please follow this link to our highly secure credit card validation page and update your credit card information.” The majority of these scams are pretty easy to detect, but they are getting more sophisticated.

Once I can trick you into typing your Amazon password to my site, you are a victim of cybercrime. If I can successfully apply a man-in-the-middle attack (for example (PDF) I can behave almost exactly like the real Amazon site would, and you may be none the wiser until your next credit card bill arrives.

So what should your site do to guard against this kind of attack against your valued customers? What could e-commerce sites do collectively?
 

 

HOME | ANNOUNCE | ADMIN | SCHED | LECT | HW | PROJ | MAIL