Project 2: Buffer Manager

This assignment is due on October 18, 2006.

Introduction

Page-1

Read first the whole assignment carefully, and review the relevant textbook materials, before you start implementing.

In this assignment, you will implement a buffer manager. The buffer manager is responsible for the follow tasks:

In addition, you will implement the Clock buffer replacement policy. Finally, you will work on a Hash Directory for efficient buffer page management.

We designed this assignment in a more flexible way compared to the last one. You are still given a well-defined interface for the major database component Buffer Manager, and required to implement it. However, you are free to design the interfaces for other classes that will be used in this assignment. We have provided interfaces for these classes as well for your reference, and you are free to base your implementation on them as well.

The interfaces for the underlying Disk Space Manager (DB class and Page class) are given. In particular, you should read in db.h the interface description of the DB class, which you will use extensively in this assignment.

Design Overview

The buffer pool is a collection of frames (page-sized sequence of main memory bytes) that is managed by the Buffer Manager. It should be stored as an array frames[bufsize] of Frame objects. Each Frame object is a record with the following fields:

pid, data, pin_count, dirty

pid is a PageID object, data is a pointer to a Page object, pin_count is an integer, and dirty is a boolean. This describes the page that is stored in the corresponding frame. The structure definition of Frame can be found in frame.h. However, unlike the BufMgr interface above which we require you to conform to, feel free to design you own interface here.

Recall a page is identified by a pid that is generated by the DB class when the page is allocated, and is unique over all pages in the database. The PageID type is defined as an integer type in minirel.h.

The Buffer Manager Interface

In the following methods, return OK if the operation is successful, and FAIL in error conditions (e.g. the buffer is full). The methods that you need to implement are described below:

BufMgr(int bufsize): Allocate "bufsize" pages (frames) for the pool in main memory.

~BufMgr(): Flush all dirty pages in the pool to disk before shutting down, and deallocate the buffer pool in main memory.

Status PinPage(PageID pid, Page*& page, bool isEmpty): Check if this page is in buffer pool. If so, increase its pin_count, and set the output page pointer page to the page pointed to by the frame. If not, choose a victim frame according to the buffer replacement policy, and write the current page in that frame back to disk if the frame holds a page and it is dirty. If this page is not in buffer pool, and isEmpty is true, the buffer manager pins the frame to an empty page, by setting the page ID of the frame to the input pid. Otherwise, the frame should read the input page of pid from disk (using the appropriate DB class method DB::ReadPage). The global instance of Disk Space Manager is MINIBASE_DB. You should involve the member functions of DB class from MINIBASE_DB.

Status UnpinPage(PageID pid, bool dirty): Find the frame, and decrement pin_count of the frame (If pin_count==0 before this decrement operation, return FAIL). dirty==true if the client has modified the page. In this case, this call should set the dirty bit for this frame.

Status NewPage(PageID& firstPid, Page*& firstPage, int howMany): Allocate a run of new pages according to howMany (using the appropriate DB class method DB::AllocatePage), and set firstPid and firstPage appropriately. Then try to pin the first page in the buffer.

Status FreePage(PageID pid): Find the page in the buffer pool, and deallocate the page (using DB::DeallocatePage).

Status FlushPage(PageID pid): Find the frame that holds in the page. If that frame is still pinned, return FAIL. Otherwise, write the page back to disk (using DB::WritePage) if it is dirty.

Status FlushAllPages(): Flush all pages of the buffer pool to disk in all cases. However, if there is some page pinned in the buffer, return FAIL. Otherwise return OK.

unsigned int GetNumOfUnpinnedFrames(): Return the number of unpinned buffer frames in the buffer pool.

int FindFrame( PageID pid ): This is a private method that will be invoked only within the BufMgr class. It returns the ID of the frame, referred to as frame number, which the page occupies in the buffer pool. If you search the bucket and don't find a pair containing this page number, the page is not in the pool, and you should return INVALID_FRAME.

Finally, the two functions for resetting the statistics and reporting the statistics using standard input/output stream libraries, ResetStat() and PrintStat(), have been implemented for you. Your buffer manager will collect the following statistics:

1) Number of PinPage requests made.
2) Number of PinPage requests that result in a miss (i.e. the page is not in the buffer when the pin request is made)
3) Number of "dirty" pages written to disk.
4) Total time taken to run the test. (This will be collected by the code in the test cases)

The Clock Buffer Replacement Policy

Theoretically, the best candidate page for replacement is the page that will be last requested in the future. Since implementing such policy requires a future predicting oracle, all buffer replacement policies try to approximate it one way or another. The LRU policy, for example, uses the past access pattern as an indication for the future. In this assignment, you will implement a variant of LRU policy, called Clock. The description of Clock can be found in Section 7.4.1 of the text book. A suggested interface for the buffer replacement policy can be found in replacer.h. However, unlike the BufMgr interface above which we require you to conform to, feel free to design you own interface here.

The Hash Directory for Efficient Buffer Page Management

Given a page ID, one can look up what frame it occupies by sequentially scanning frames[bufsize]. To perform the look up more efficiently, we need an auxiliary data structure, a collection of <page number, frame number> pairs, where the first field page number is indexed. We use a simple hash table for this purpose. The hash table should be implemented (entirely in main memory) by using an array of buckets. Each bucket is a linked list of <page number, frame number> pairs. Given a page number, you should apply a hash function to find the directory entry pointing to the bucket that contains the frame number for this page, if the page is in the buffer pool. If you find such a pair, it will tell you the frame in which the page resides. This is illustrated in Figure 1:
 
 

The hash function must distribute values in the domain of the search field uniformly over the collection of buckets. If we have HTSIZE buckets, numbered 0 through M-1, a hash function h of the form

works well in practice. HTSIZE should be chosen to be a prime number.

If the Hash Directory is used to manage buffer pages, it needs to be maintained/updated in some of the member functions of BufMgr to  stay consistent. For example, when a page is pinned, the buffer manager should do the following: Check the buffer pool (by using the hash table) to see if it contains the requested page. If the page is not in the pool, it should be brought in according to the description above in PinPage(). In addition, it should delete the entry for the old page from the Hash Directory, and insert an entry for the new page.

The interfaces of the classes related with hash table can be found in hash.h. However, unlike the BufMgr interface above which we require you to conform to, feel free to design you own interface here.

Compilation and Testing

The source code for the project can be downloaded from the course management system. If you unzip this file, you will see the following directories and files.

Top-level Directory

Directories "src" and "include":

Directory "lib" contains the library files you will need in order to generate an executable.

What You Need to Do

You are required to modify and fill in the gaps in bufmgr.cpp and bufmgr.h, specifically :

What to Turn In

You should submit the same set of files given to you at the beginning of this assignment, plus any additional header and source files you have created for this assignment. The code should compile under the Visual Studio solution file you submit. Before submission to CMS, zip up your set of files into BufMgr.zip.

Important note: Your assignment should compile and run on Visual Studio .Net on Windows XP machines (such as available in the CSUGLAB). If you like, you can do your implementation at home on personal machines. However, it is your responsibility to get the code to work on the departmental machines before you submit it (since the TAs will test the code in the departmental machines).

Please remember late submissions will not be accepted. Make sure to start early!

FAQ of Assignment 2

Q: Which component should manage the memory that holds pages in buffer pool?

A: The Frame object should. When a frame is initialized, the memory for holding a page should be allocated. When the object is destructed, the memory should be released.

Q: After all the buffer manager tests are run, the system may crash right before it exits (when delete minibase_globals is called in main() function). What's happening?

A: There might be a bug in the provided library files. You should not worry about it though. Just focus on making your code pass the buffer manager test cases.

Q: How do we use MapIterator?

A: MapIterator can be used in Bucket to iterate over the elements in a Map* container. The iterator design pattern in general increases the modularity between the container component (the Map* container in this case) and its client component (the Bucket component that uses Map* container in this case). The use of MapIterator is however optional. That is, in Bucket, you could iterate over the elements in Map* container without using MapIterator.

Q: What is the function Map* MapIterator::operator() () supposed to do?

A: The Map* MapIterator::operator() () function corresponds to the "getNext" function of any iterator interface. In this case, it is used to get the next Map* element from the Map* container. Here is the sample code which uses the MapIterator to iterate over all elements in the Map* container. Hope it helps.

//suppose maps is the container that stores a collection of Map* elements.
MapIterator next(maps);
Map *curr;

// Iterate over the list and do something.
while (curr = next()) {
    //do something here
}
 

Q: In test 2, line 212, the code tries to free a page (by calling MINIBASE_BM->FreePage) that is pinned once, but the buffer is full. We need to invoke DB::DeallocatePage to deallocate the page, but DB::DeallocatePage will in turn call BufMgr::PinPage and UnpinPage to load/unload directory page. Since the buffer is full, the directory page cannot be pinned into the buffer. Would this make the invocation of MINIBASE_BM->FreePage fail?

A: Not really. When the buffer is full, and you try to free a page that is pinned once, you need to *first* unpin it, and *then* deallocate it in the database. That way, when you invoke DB::DeallocatePage, there will be a free frame available to hold the directory page pinned in DB::DeallocatePage.

Q: What is the function Frame::Free() supposed to do?

A: For Frame::Free(), the intention was to deallocate the page held by this frame from the database (by invoking DB::DeallocatePage). This way, it can be invoked from BufMgr::FreePage(). Note that "free" does not mean to free the memory held in the frame itself. However, you can of course "inline" its logic in BufMgr::FreePage(), in which case you do not need this function Frame::Free() any more.

Q: When I start the system, it crashes at executing line 17 of main() function. What happens?

A: Basically in line 17 where it executes minibase_globals = new SystemDefs(status, "MINIBASE.DB", 2000, bufSize, "Clock"), some member functions in BufMgr will be invoked, such as PinPage. You can verify this by setting break points to your implementation of the BufMgr member functions. Therefore if you did not implement these BufMgr member functions correctly, the system will crash.

In particular, when BufMgr::PinPage and BufMgr::UnpinPage are invocated by DB::DB() indirectly from line 17 of main.cpp, the page that is pinned and returned is a directory page, with pid = 0 or 1. You should make sure that PinPage and UnpinPage corrects pings and unpins the directory pages.

Due to an issue in the library provided, the system may not behavior correctly when the size of a BufMgr class instance is too big. To reduce the size, it is recommended that you define member variables HashTable *hashTable and Replacer *replacer in BufMgr, instead of HashTable hashTable and Replacer replacer.

You can use either ClockFrame **frames or ClockFrame *frames in BufMgr. It does not make a difference in the size of a BufMgr class instance.


Mingsheng Hong
Last modified: 10/17/2006