This exam will comprise 4 questions.
Here are the first two (7:00 PM 14 May).
The remaining ones will appear tomorrow AM.
This is a take-home final exam due by midnight Thursday, 20 May 2004
(that is, the midnight between Thursday and Friday).
Please submit a PDF file to CMS.
Rules are the same as for the prelim:
The exam is open book, open notes.
You may discuss problems if you wish.
Your solution should reflect your own work;
solutions on different exams should not appear too similar.
If you discuss a problem with someone,
please reference that person on your solution.
Problem 1: Asynchronous Business Transactions
Consider the simple order processing protocol from Lectures 21-22.
It involves three sites: the main site (where users connect),
a shipping site and a billing site.
We discussed this under the assumption that each site supported
local transactions, but distributed transactions across the sites
were not supported.
We are going to make three changes.
First, we introduce transactional messaging.
We still do not support distributed transactions among the sites,
but we introduce a shared message queue database.
At any site, a process may dequeue an input message,
perform some processing to update its local database,
enqueue some output (reply) messages,
then commit all these actions as a single ACID transaction.
You can assume that once a message has been enqueued
(and the transaction commits)
the message will be reliably delivered to its intended recipient.
In effect, the middleware at a particular site can act as a 2PC coordinator
for transactions involving that site’s local database
and the message queue database,
but no other site’s local database.
Our second change is to allow a customer to reconnect to the main site
and alter or cancel an outstanding order.
Note that this change introduces the possibility of crossed messages
as discussed briefly in Lecture 22.
The final change is to add an undeliverable notification
from the shipping site,
e.g. as the result of shipping to a nonexistent address.
This requires compensation: at very least a refund to the customer’s
credit card.
Under these new assumptions,
give a complete design,
including the data stored at each site,
the (synchronous and asynchronous) operations at each site,
for all scenarios.
In particular, describe the compensation required when an order is undeliverable.
Discuss changing or cancelling an order,
including how to deal with crossed messages.
Argue convincingly that there are no possible race conditions
that could lead to an inconsistent state.
Problem 2: Database Connection Pooling
This question considers management of connections between the middle tier
and the database: the connection pool manager.
You should be familiar with connection pooling.
A connnection pool manager hands out connections to application code
at an operation-by-operation granularity: each
database access performed by the application code follows a pattern like
Connection c =
acquirePooledConnection(...)
doSQL(c, ...)
or
commit(c)
or
abort(c)
releasePooledConnection(c)
This facilitate re-use of connections,
so the system can avoid expensive
connection creation and teardown operations.
At any time an active connection to a database has the following two parameters:
dbuid
, the database user id associated with the connection
txnid
, the globally unique id of the current transaction
Database systems differ in their ability to set or change these parameters:
In a static system each new connection gets an authenticated dbuid
which does not change for the life of the connection.
Each transaction is associated with exactly one connection.
Creating a connection implicitly starts a new transaction for it;
committing or rolling back the transaction implicitly creates another transaction
on the same connection; and closing a connection implicitly rolls back
its current transaction.
In a transaction-dynamic system
the dbuid
associated with a connection cannot change,
but a connection may switch dynamically among transactions (txnid
s)
associated with its dbuid
.
(The database server internally enforces single-threaded semantics
of individual transactions.
This does not forbid multiple connections associated with the same txnid
;
conceptually it just requires that each operation at the database
locks its associated transaction
object to ensure mutual exclusion.)
In a fully dynamic system
the txnid
and even dbuid
associated with an open connection can be changed dynamically,
allowing a single connection to be shared among multiple transactions and
database users.
(Oracle has fully dynamic connections
to support connection pooling
where both the client and the server are Oracle database instances.)
Obviously there is some cost associated with changing the transaction or user
associated with a connection –
particularly the first time a connection switches to a new dbuid
,
which may require a complete authentication.
Nevertheless, the cost can be significantly less
than the execution cost of tearing down a
connection and creating a new one,
or the resource cost of maintaining multiple open connections.
(a)
It is common for an application to be associated with only one
(or perhaps just a few) dbuid
s.
That is, all database operations performed by the application
use a dbuid
that is associated with the application,
but is not specific to the customer on whose behalf
the operation is being performed.
This is the way we used Oracle for the CS530 projects.
Clearly, since the dbuid
is always the same,
a fully-dynamic connection scheme has no advantage
over a transaction-dynamic one in this situation.
Under these conditions,
describe how the connection pool manager
in the application server would be implemented
for a database supporting static connections
and for a database supporting transaction-dynamic connections.
You may assume that the current txnid
is
returned as part of each database operation,
so the connection pool manager always knows the current txnid
.
(b)
Under the conditions of part (a),
suppose you are trying to choose between a database
that provices only static connections, and a (presumably more expensive)
database that provides transaction-dynamic connections.
How would you make the choice,
without implementing both solutions and measuring them?
First, just state informally how a transaction-dynamic system
might achieve higher performance.
Next, try to be more formal.
This is a pretty open-ended question.
There are several parameters of interest,
including the number of open connections in the pool,
the frequency with which connections are created and destroyed,
and (in the dynamic case)
the frequency with which connections switch among transactions.
These are (roughly) the space and time costs of connection pool management,
and they can trade off against one another.
Note there are costs at both the app server and the database server.
The idea is to identify the important parameters of your application
workload (involving its use of pooled connections and transactions).
These parameters should enable you to write down
expressions for expected space and time cost of connection pool management,
as a function of the number of simultaneous logged-on customers at your site.
(c)
Some application designers suggest
assigning a distinct dbuid
to each registered customer of your site.
Database queries performed on behalf of a given customer
use a database connection with that customer’s associated dbuid
.
The argument in favor of this approach is that database operations are performed
with the minimum possible privilege – each database user
(hence each customer) is given access
only to the data she is entitled to see.
What problems can you see with this scheme?
Specifically,
discuss (i) the cost of connection pool management if connections are not fully dynamic,
and (ii) the process of registering a new customer at the site.
Problem 3: Performance of Optimistic and Pessimistic Offline Locking
This question has a lot of math formatting,
so it is here: (PDF).
Problem 4: Phishing?
Okay, this exam is really long enough.
So this last question is a sort of essay question.
About once a day I receive an email
purporting to be from some reputable e-commerce site
like Amazon (choose your own favorite),
saying something like
we may have had a security compromise;
please follow this link to
our highly secure credit card validation page
and update your credit card information.
The majority of these scams are pretty easy to detect,
but they are getting more sophisticated.
Once I can trick you into typing your Amazon password to my site,
you are a victim of cybercrime.
If I can successfully apply a man-in-the-middle attack
(for example (PDF)
I can behave almost exactly like the real Amazon site would,
and you may be none the wiser until your next credit card bill arrives.
So what should your site do to guard against this kind of attack
against your valued customers?
What could e-commerce sites do collectively?