CS 432/33
Assignment 6
Transaction
Simulation & Crash Recovery
Due Date:
Wednesday, November 18, 1998
What to turn in:
Just put everything in Goose:
- Your project
- Your output of the tests
- Make sure you organize them to be readable.
This means you should separate the output
into separate files for each test.
Introduction
In this assignment you will complete certain parts of a
database engine. The database is modeled very simply, and runs by
executing a series of transaction operations given to it by the
user. Each transaction performs a series of reads and writes on
the pages of the database, and can commit or abort at any time.
The simulator keeps a recovery manager module that keeps a log
of the database's activity. At any point in time, the database
may crash (specifically, when it encounters a special command to
induce a crash). The recovery manager then performs a restart to
restore the database to a correct state, using the Aries recovery
algorithm.
The Transaction Simulator
The transaction simulator works just like a mini-database.
There is a series of pages stored in a file on disk, accessed
through a buffer. Transactions are modeled as a sequence of reads
and writes to the pages of the database. A log is kept of all
changes to database pages, with WAL used to insure that committed
transactions are fully represented by the log as written to
stable storage. Checkpoints can be inserted at will.
This assignment uses an application called MARS which is a GUI
for testing the simulator. The input comes from .mars files that
can be found in the folder Tests. These files can be edited using
any simple text editor. The syntax for commands to the simulator
is discussed in Getting Started with the
Simulator.
A breakdown of the major components follows:
- BufMgr - This module operates much like the buffer you
implemented for project 1. When a page is to be read or
written to, a pinPage() request is made, and the page is,
if not already present, loaded into the buffer pool. On
request, one or all pages can be flushed. Pages are
released with an unpinPage() call.
The buffer for this project is somewhat small, being
intended to illustrate the role of the buffer in crash
recovery.
- Xaction_table - This keeps track of all the transactions
currently active in the database. It also keeps
information on each active transaction, namely the oldest
and most recent log sequence numbers (LSN's) for each
transaction.
- The log subsystem:
- log - This is the log that is used in WAL while
the transactions are executing. It provides
simple sequential access to the log files for
reading purposes, and can append new records to
the end of the log. The log records are of
uniform size, and the log does not concern
itself with the type of log being written.
- logrecord - This represents a single log entry. A
special field is kept in each record to identify
the variety of log record (commit, update, etc.)
that it is. It contains a generic data buffer,
which holds the information specific to each type
of log record.
- masterlog - This keeps track of the
checkpointing, so that the recovery process will
not need to go back too far to reconstruct the
database at the time of the crash.
- LogData structures - These classes represent each type of
update, and the information associated with it like
prevLSN, xaction_id, and the page affected. Each Log Data
structure is stored in the data field of a log record,
and should be accessed by typecasting that data to the
appropriate LogData type (UPD, CLR, ABORT).
- Recovery Manager - consists of several related modules
for logging & recovery procedures. While the database
is running, each transaction has its own recovery manager
that is responsible for logging its actions. Only one
recovery module, though, is necessary for performing
crash recovery. The pieces of the manager are:
- The logging functionality (logfunc.cpp) - This
translates write requests by transactions into
Update records that are written to the log, and
creates CLR records whenever a process aborts on
its own.
- rollback.cpp - This performs a rollback on a
process that has just aborted on its own, undoing
the changes and making sure that none of them
persist even after a crash.
- restart.cpp - This is the part responsible for
bringing the database up to a consistent state
following a crash. It performs the three phases
of the Aries recovery algorithm based on the
information written to the log.
- Checkpoint - This generates checkpoints and
writes them to the log, extracting the
information from the Xaction table and getting
the Dirty Page Table from the Buffer.
- Recovery DirtyPageTable - this is the list of
possibly dirty pages that is built during the
analysis phase of recovery.
- Recovery XactTable - this is the list of active
transactions that is built during the analysis
phase of recovery.
The recmgr_tab.c module is responsible for parsing the input.
It is computer generated, and not very fit for modification.
Don't worry too much about what it does; just know that it takes
one command at a time from the standard input (which we might
have redirected to point to a file) and converts it into a
database operation.
The handle.cpp module contains the standalone functions for
performing the commands. There's one for each operation, and are
the "top-level" functions that are first called when an
operation is done. The functions in handle.cpp are the functions
that will call the Recovery Manager functions that you will be
finishing, and will pass in the relevant data about the
operations.
The complete code is available in the CS432/A6 folder in Goose
and it will consist of the complete project file. You can open it
by double clicking the Mars.dsw file. All you need to do is fill
in gaps inside some .cpp files in the project. See below.
When you have included your modification and are ready to test
your code, you can run the Mars.exe file generated by visual
studio when you compile the project. When you are generating the
Mars.exe file it is recommended that you build the release
version of the project. This can be done by opening the Build
menu, selecting "Set Active Configuration", and choose
"Release". The .exe file can then be found in the
"Release" folder. Simply double click it to start the
transaction manager.
Your Task
You will implement various pieces of the recovery module.
Specifically, you will implement the functionality to handle the
most basic and common of transactions, the write
(WriteUpdateLog()). You must add code to the logging mechanism so
that writes (also called updates) to the pages of the DB
are reflected in the log, and then implement those parts of the
restart mechanism that deal specifically with those log entries
to restore the database to a consistent state.
The code you will write belongs in these modules:
- logfunc.cpp - You should complete this function:
- WriteUpdateLog() - Given the information about a
given update, generate an update log record, and
update all affected information in the rest of
the database.
- restart.cpp - You should complete these four main
functions:
- restart_analysis - This scans the log record
forward from the last checkpoint, building up
information about the database at the time of the
crash. Fill in the code executed when the record
being looked at is an update record, updating the
Recovery Xaction table and Recovery Dirty Page
Table being rebuilt.
- redo_update - This is the function for redoing a
single action. Add the code for redoing an update
(UPD) record, extracting the necessary
information from the log record data and
performing the update.
- restart_redo - This is the second phase of
recovery, and restores the database to its
pre-crash state. You should implement the code
that handles each update record, calling
redo_update if the action needs to be retaken.
You should consider the pageLSN stored with each
page to determine the necessity of repeating the
action, and update the Recovery Dirty Page Table
where necesary.
- restart_undo - This third phase of recovery
aborts all transactions active at the time of the
crash, scanning the log backwards and
undoing the actions of that transaction.
Implement the code that handles the case for
undoing an update record. You'll need to work
with the Recovery Xaction table, and you should
generate a CLR to be written to the log.
Hint: This is essentially a rollback of the
transaction, so look at the code that normally
handles a rollback when a transaction aborts.
- You should also finish three small methods
concerning the control of the recovery process:
- findRedoLsn() - identifying the earliest lsn from
which the redo process should start.
- keepPerformingUndo() - determining when to stop
the undo process.
- findNextUndoLsn() - identifying the next log
record to be undone in the undo process.
Reference
The following provide more detailed explanations about the
classes and types that will be useful for this assignment.
Minor Bugs
- When you run Mars with a new test, it may not
produce values for the read commands (i.e. - read 1 4;
will display 'read returned 0'). This is usually
characterized by ALL the read commands returning 0. Close
Mars and run it again by choosing the test you want to
load from the Transactions menu's recent test files a la
Word, Excel, etc, and it should work. This bug should not
hamper your work in any way but we're working to fix it.
Tips
- If you want to print out something other than a log
record (which would use PrintLogRec ( )), use the
function WriteLogOutput ( char * ). In some files, it may
have to be extern'ed before it can be used.
- e.g. -
extern void WriteLogOutput(char *);
...
WriteLogOutput( "now entering function
WriteUpdateLog( )" );
...
char s[30];
sprintf( s, "the LSN of the record just
written was %d", lsn.GetOffset( ) );
WriteLogOutput( s );
- But for more efficient debugging, use the VC++ debugger.
Make sure you are in the Debug version and not the
Release version. Go under the menu Build -> Set Active
Configuration... and select Mars -> Win32 Debug.
What to Hand In
- A listing of all source files you modified or added
(though I doubt you'll have to add any).
- The output, with debugging enabled, of your code when run
on tests that demonstrate the full range of
functionality. The output from Test 7, 8, 9 would be
suitable. Be sure include enough detailed output so that
I can see the steps taken by the recovery manager to
restore the database after a crash. The debugging code
already written (mainly PrintLogRec) and included in the
code should be sufficient for this purpose.
- An explanation of your code, including any assumptions
made, and any deviations from the standard Aries recovery
scheme given in the text. Include some comments on the
design of the recovery manager, including problems you
saw and ways to improve the code.
Grading
- 70% Correctness
- 20% Documentation
- 10% Coding Style
Questions
All questions should be directed to the course staff. Good luck!