Project 4
Transaction Simulation & Crash Recovery
Due Date: Monday, November 17, 1997
Introduction
In this assignment you will complete certain parts of a simple database
engine. The database is modeled very simply, and runs by executing a series
of transaction operations given to it by the user. Each transaction performs
a series of reads and writes on the pages of the database, and can commit
or abort at any time.
The simulator keeps a recovery manager module that keeps a log of the
database's activity. At any point in time, the database may crash (specifically,
when it encounters a special command to induce a crash). The recovery manager
then performs a restart to restore the database to a correct state, using
the Aries recovery algorithm.
The Transaction Simulator
The transaction simulator works just like a mini-database. There is
a series of pages stored in a file on disk, accessed through a buffer.
Transactions are modeled as a sequence of reads and writes to the pages
of the database. A log is kept of all changes to database pages, with WAL
used to insure that committed transactions are fully represented by the
log as written to stable storage. Checkpoints can be inserted at will.
The input comes either directly from the user or from an input file
in a predetermined format. The syntax for issuing commands to the simulator
is discussed in Getting Started with the Simulator.
A breakdown of the major components follows:
- BufMgr - This module operates much like the buffer you implemented
for project 1. When a page is to be read or written to, a pinPage() request
is made, and the page is, if not already present, loaded into the buffer
pool. On request, one or all pages can be flushed. Pages are released with
an unpinPage() call.
The buffer for this project is somewhat small, being intended to illustrate
the role of the buffer in crash recovery.
- Xaction_table - This keeps track of all the transactions currently
active in the database. It also keeps information on each active transaction,
namely the oldest and most recent log sequence numbers (LSN's) for each
transaction.
- The log subsystem:
- log - This, quite simply, is the log that is written to while the transactions
are executing. It provides simple sequential access to the log files for
reading purposes, and can append new records to the end of the log. The
log records are of uniform size, and the log does not concern itself
with the type of log being written.
- logrecord - This represents a single log entry. A special field is
kept in each record to identify the variety of log record (commit, update,
etc.) that it is. It contains a generic data buffer, which holds the information
specific to each type of log record.
- masterlog - This keeps track of the checkpointing, so that the recovery
process will not need to go back too far to reconstruct the database at
the time of the crash.
- LogData structures - These classes represent each type of update, and
the information associated with it like prevLSN, xaction_id, and the page
affected. Each Log Data structure is stored in the data field of a log
record, and should be accessed by typecasting that data to the appropriate
LogData type (UPD, CLR, ABORT).
- Recovery Manager - consists of several related modules for logging
& recovery procedures. While the database is running, each transaction
has its own recovery manager that is responsible for logging its actions.
Only one recovery module, though, is necessary for performing crash recovery.
The pieces of the manager are:
- The logging functionality (logfunc.cpp) - This translates write requests
by transactions into Update records that are written to the log, and creates
CLR records whenever a process aborts on its own.
- rollback.cpp - This performs a rollback on a process that has just
aborted on its own, undoing the changes and making sure that none of them
persist even after a crash.
- restart.cpp - This is the part responsible for bringing the database
up to a consistent state following a crash. It performs the three phases
of the Aries recovery algorithm based on the information written to the
log.
- Checkpoint - This generates checkpoints and writes them to the log,
extracting the information from the Xaction table and getting the Dirty
Page Table from the Buffer.
- Recovery DirtyPageTable - this is the list of possibly dirty pages
that is built during the analysis phase of recovery.
- Recovery XactTable - this is the list of active transactions that is
built during the analysis phase of recovery.
The recmgr_tab.c module is responsible for parsing the input. It is
computer generated, and not very fit for modification. Don't worry too
much about what it does; just know that it takes one command at a time
from the standard input (which we might have redirected to point to a file)
and converts it into a database operation. If you'd like to modify it in
any way, let me know and I'll see what I can do.
The handle.cpp module contains the standalone functions for performing
the commands. There's one for each operation, and are the "top-level"
functions that are first called when an operation is done. The functions
in handle.cpp are the functions that will call the Recovery Manager functions
that you will be finishing, and will pass in the relevant data about the
operations.
Your Mission
You will implement various pieces of the recovery module. Specifically,
you will implement the functionality to handle the most basic and common
of transactions, the write. You must add code to the logging mechanism
so that writes (also called updates) to the pages of the DB are
reflected in the log, and then implement those parts of the restart mechanism
that deal specifically with those log entries to restore the database to
a consistent state.
The code you will write belongs in these modules:
logfunc.cpp - Implement the WriteUpdateLog() function. Given the information
about a given update, generate an update log record, and update all affected
information in the rest of the database.
restart.cpp - You should complete four functions.
- restart_analysis - This scans the log record forward from the last
checkpoint, building up information about the database at the time of the
crash. Fill in the code executed when the record being looked at is an
update record, updating the Recovery Xaction table and Recovery Dirty Page
Table being rebuilt.
- redo_update - This is the function for redoing a single action. Add
the code for redoing an update (UPD) record, extracting the necessary information
from the log record data and performing the update.
- restart_redo - This is the second phase of recovery, and restores the
database to its pre-crash state. You should implement the code that handles
each update record, calling redo_update if the action needs to be retaken.
You should consider the pageLSN stored with each page to determine the
necessity of repeating the action, and update the Recovery Dirty Page Table
where necesary.
- restart_undo - This third phase of recovery aborts all transactions
active at the time of the crash, scanning the log backwards and undoing
the actions of that transaction. Implement the code that handles the case
for undoing an update record. You'll need to work with the Recovery Xaction
table, and you should generate a CLR to be written to the log.
Hint: This is essentially a rollback of the transaction, so look at
the code that normally handles a rollback when a transaction aborts.
You should also finish three small methods concerning the control of
the recovery process. Specifically, the functions
findRedoLsn() - identifying the earliest lsn from which the redo process
should start.
keepPerformingUndo() - determining when to stop the undo process.
findNextUndoLsn() - identifying the next log record to be undone in
the undo process.
What to Hand In
- A listing of all source files you modified or added (though I doubt
you'll have to add any).
- The output, with debugging enabled, of your code when run on tests
that demonstrate the full range of functionality. Something like Test 7
would be suitable, with enough detailed output so that I can see the steps
taken by the recovery manager to restore the database after a crash. The
debugging code already written (mainly PrintLogRec) and included in the
code should be sufficient for this purpose.
- An explanation of your code, including any assumptions made, and any
deviations from the standard Aries recovery scheme given in the text. Include
some comments on the design of the recovery manager, including problems
you saw and ways to improve the code.
Grading
- 70% Correctness
- 20% Documentation
- 10% Coding Style
Questions
All questions should be directed to Patrick
McClanahan. or the staff
Good luck!