CS414 Programming Project 3

There is a minor bug in disk.sync.c: you can download a corrected version here.

In the final project, you will implement a file system for minithreads. But don't worry -- rather than writing kernel code, you will implement the file system at user-level. To help you, we have provided code for a simulated disk device in disk.h. The disk device simulates a disk by translating block read and write operations into operations on a single Windows NT file.

Project organisation

Your task is to implement a hierarchical, Unix-like file system on top of the disk emulator. The file system interface should have the following operations:

minifile_t minifile_create(char *filename)
minifile_t minifile_open(char *filename, char *mode)
int minifile_read(minifile_t file, char *data, int maxlen)
int minifile_write(minifile_t file, char *data, int len)
int minifile_close(minifile_t)
int minifile_unlink(char *filename) // delete file
int minifile_mkdir(char *dirname)
int minifile_rmdir(char *dirname)
int minifile_stat(char *path) 
int minifile_cd(char *path) 
char **minifile_ls(char *path) 
char *minifile_pwd()

The minifile.h file provides more information on these functions and their intended behaviour. They are very similar to the traditional Unix operations for accessing files (except for cd and ls, which are provided by utilities). If you do not modify this interface, you should be able to compile and run the shell program we have provided (included with the code for this project, see below). You should then be able to create files and directories from the command line.

To get an idea of what these functions should do, look them up (omitting the "minifile_" prefix) using the man command on a Unix system, or in the Visual Studio help. The only major difference from the traditional interface is that our minifile_open takes arguments like the fopen call, instead of open.

Note that not all of the functions are file operations. For instance, the concept of the "current working directory" is not a global abstraction that applies to the file-system, but a piece of local state kept with each process (minithread, in this case). It follows that minifile_pwd() returns the path to the current working directory associated with the calling thread. Your file-system should NOT have any such state as global variables shared across independent threads.

A robust system needs to explain why an operation has failed when an error occurs. However, to cut down implementation time in this project, don't worry about reporting detailed error codes when something goes wrong, just returning -1 from the function (or some other appropriate error value) is enough.

File system requirements. Your file system should support variable-sized files via a Unix-like inode mechanism, and reuse of blocks from unlinked (deleted) files. Make sure to test your code extensively. You are not required to write code which is robust to disk and system crashes, or provides crash recovery, but you should ensure a graceful shutdown when the application terminates.

Two options

This project comes with two versions of the disk.h and disk.c files: a simple synchronous implementation in disk.sync.{h,c}, and a more complicated asynchronous implementation in disk.async.{h,c}. To use one of them, copy the relevant versions of the files to disk.h and disk.c.

Either version of the disk simulator is relatively straightforward. To create a disk, use the disk_create() function, providing a name for your disk. You can also specify some disk flags to control disk behavior, and give a maximum size for the disk. Of course, you should only create a disk if there is no disk with that name already, otherwise you will lose all the data on the existing disk. Use disk_startup() to use a disk you have previously created. To begin using the disk, just issue disk requests through the disk_read() and disk_write calls. The format of the requests is shown in disk.h.

To complete this project, you are required to use the synchronous disk implementation. This simplifies the concurrency control required in your file system -- just put a lock on the disk and everything will be fine -- but it has the disadvantage of being unrealistic. If you use the synchronous disk, you will be assigned a grade out of 10, just as with previous projects (each project is worth 10% of the overall course grade).

However, if you wish, once you have your file system implementation working with a synchronous disk, you may replace the synchronous disk with the more realistic asynchronous disk, which emulates nonblocking reads and writes and uses interrupts to indicate I/O request completion. This allows you to make the file system multithreaded (i.e. more than one thread can perform file I/O concurrently), but requires extensive rewriting of the file system conconcurrency control and the way disk operations are used. If you rewrite your file system to use the asynchronous disk, you will be eligible for up to 3 extra points, which can make up for points lost on previous projects or on this project.

Warning: If you attempt the extra credit option, make sure you have the synchronous disk implementation of the file system working first.

What to submit: Regardless of whether or not you attempt the asynchronous disk option for extra credit, submit your synchronous disk implementation. If you attempt the extra credit option, also submit your asynchronous disk implementation.

Guidelines for filesystem design

Disk organisation: Use the first block from the disk as the superblock followed by the disk blocks that contain the inodes (about 10% of the disk). Use the rest of the disk for data blocks.

Superblock: The superblock contains global information about the file system. Some fields in the superblock might be: the first free inode (if the free inodes are organized in a linked list), the first free data block (more about this later), the disk address of the root inode (the entry point in the filesystem), the total number of inodes and blocks, the overall size of the filesystem and any other thing you consider useful. Also, a predefined number in the first 4 bytes of the superblock (called a "magic number", for instance, 0x8888) will help you determine if you have a legitimate filesystem on a disk or not.

Inodes: An inode contains information about the file or directory it represents. Any relevant information about the file, except the name, should be kept in its inode. This might include: type (file or directory), size, the disk address of the next free inode (to maintain the list of free inodes), etc. Also the inode contains information about the data blocks used by the file. You have to be able to address 11 direct blocks and one indirect block, which contains pointers to other disk blocks. You can find more information about direct and indirect blocks from lecture notes or Silberschatz and Galvin.

Directories: Directories are "special" files that contain the mapping between file or directory names and inode numbers (or disk addresses). The only differences between a directory and a regular file are: directories cannot be deleted with unlink and can be deleted only if they are empty, they have a fixed format, and they have the type set to DIRECTORY. Imposing an upper limit on the name field in a directory entry is acceptable, but it must be at least 256 characters. Using a linear search to find the inode number that corresponds to a particular filename in the directory is sufficient.

Paths have the general form /dir1/dir2/.../dirn/filename. The first / refers to the root of the file system; other slashes are just separators.

Underlying disk file: Since you do not have to support mount points, your minithreads system will only be able to support one filesystem. This filesystem should reside on a virtual disk that uses the file MINIFILESYSTEM in the directory with your code. Write a C program, called mkfs.exe, that creates an empty filesystem on this disk, containing only one directory (the root directory) with no entries.

Hints for the extra credit option

The asynchronous disk has the same interface as the sychronous disk, with the addition of an install_disk_handler function to install your handler for disk interrupts. This function must be called before any other disk operation: create, startup, etc. Most of the disk operations are replaced with asynchronous ones which do not block, but raise a disk interrupt (calling your interrupt handler) when they complete.

The disk simulator is not a completely accurate simulation of a real disk's behaviour, but it does have the property of reordering queued requests into an arbitrary order before executing them (in practice fact, an efficient disk controller will reorder requests quite aggressively). Consequently, if you have a series of blocks that need to be written with a well-defined order (e.g. block A before B before C), then your file system code must block so as to ensure that request B does not execute before the request for A has been completed.

To enable multiple threads to access files at the same time, you must add file-level locking. Unfortunately, since several file operations operate on more than one file or directory at a time, this may lead to a deadlock! The following lock acquisition protocol will ensure you do not have circular waiting:

Do not hold multiple locks during pathname translation.
Always acquire locks "down the directory tree": i.e. always acquire a directory lock before you acquire a lock on a file in that directory. This will ensure that you will not get a deadlock when you try and get a lock on a directory and a regular file at the same time.
If you have to acquire a lock on two directories at the same time, acquire them in the order of their inode numbers. You should only have to worry about this in the rename operation.
Do not acquire a lock a second time if you already hold it! Once again, this is a problem with the pathname translation during a rename (e.g. we could move a file into a directory which is in a subdirectory of the original location).

How to get started

Make a backup copy of your code from the previous project
Download the files for this project: minithreads3.zip. These comprise the disk device code, minifile interface, and a shell.
Move the new files into the same directory as your minithreads code.
Copy disk.sync.h and disk.sync.c to disk.h, disk.c respectively.

We have included a simple command shell in shell.c, which you can link against your minifile implementation. It should enable you to test your code from the command line.

Submitting the project

Zip up your source files and Makefile for the project and mail the zip file to Ben Atkin. In the e-mail message, include the names of the people in your group, and whether you attempted the extra credit option for the project.