CS414 Programming Project 3

In the final project, you will implement a file system for minithreads. But don't worry -- rather than writing kernel code, you will implement the file system at user-level. To help you, we have provided code for a simulated disk device in disk.h. The disk device simulates a disk on top of a single Windows NT file, translating block read and write operations into operations on the underlying file.

Project 3 update: if you have had trouble compiling the project 3 code with shell.c, download this new Makefile. With this change, Minithreads should now compile. It is normal to get a segmentation fault immediately if you haven't implemented the minifile functions.

Project organisation

Your task is to implement a hierarchical, Unix-like file system on top of the disk emulator. The file system interface should have the following operations:

minifile_t minifile_create(char *filename)
minifile_t minifile_open(char *filename, char *mode)
int minifile_read(minifile_t file, char *data, int maxlen)
int minifile_write(minifile_t file, char *data, int len)
int minifile_close(minifile_t)
int minifile_unlink(char *filename) (delete file)
int minifile_mkdir(char *dirname)
int minifile_rmdir(char *dirname)
int minifile_stat(char *path) 
int minifile_cd(char *path) 
char **minifile_ls(char *path) 
char *minifile_pwd()

These functions very closely to the traditional Unix operations for accessing files (except for cd and ls, which are provided by utilities). If you do not modify this interface, you should be able to compile and run the shell program we have provided (included with the code for this project, see below). You should then be able to create files and directories from the command line.

To get an idea of what these functions should do, look them up (omitting the "minifile_" prefix) using the man command on a Unix system, or in the Visual Studio help. The only major difference from the traditional interface is that our minifile_open takes arguments like the fopen call, instead of open.

Note that not all of the functions are file operations. For instance, the concept of the "current working directory" is not a global abstraction that applies to the file-system, but a piece of local state kept with each process (minithread, in this case). It follows that minifile_pwd() returns the path to the current working directory associated with the calling thread. Your file-system should NOT have any such state as global variables shared across independent threads.

A robust system needs to explain why an operation has failed when an error occurs. However, to cut down implementation time in this project, don't worry about reporting detailed error codes when something goes wrong, just returning -1 from the function (or some other appropriate error value) is enough.

File system requirements. Your file system should support variable-sized files via a Unix-like inode mechanism, and reuse of blocks from unlinked (deleted) files. It is vital that your file system has concurrency control, so that it can cope with simultaneous accesses by multiple threads.

Make sure to test your code extensively. Simple sequential tasks, such as creating files, creating directories, removing directories, etc. should be easy. But you should also test your code with concurrent accesses. Since you have a bit more than a week to do the project, we are not requiring you to write code which is robust disk to crashes or does crash recovery.

Overview of the disk simulator

The disk simulator is relatively straightforward. To create a disk, use the disk_create() function, provide a name for your disk. You can also specify some disk flags to control disk behavior, and give a maximum size for the disk. Of course, you should only create a disk if there is no disk with that name already, otherwise you will lose all the data on the disk. Use disk_startup() to use a disk you have previously created.

To begin using the disk, just issue disk requests through the disk_send_request() function (alternatively, you can use the disk_read(), disk_write(), etc, calls: these are translated into calls to disk_send_request() with the appropriate request type). The format of the requests is shown in disk.h. When (and if) requests complete, the disk controller signals the completion by raising an interrupt. As with previous assignments, you need to write an interrupt handler that will handle these interrupts appropriately.

The disk simulator is not a completely accurate simulation of a real disk's behaviour, but it does have the property of re-ordering queued requests to an arbitrary order before executing them (this seems annoying and pointless from our point of view, but as we'll see when we come to discuss disk scheduling in class, there are legitimate efficiency motives for doing this). In fact, an efficient controller will reorder requests quite aggressively. Consequently, if you have a series of blocks that need to be written with a well-defined order (e.g. block A before B before C), then you must, in your file-system code, make sure that you do not issue request B before the request for A has been completed.

Since you do not have to support mount points, your minithreads system will only be able to support one filesystem (on one disk). This filesystem should reside on a virtual disk that uses the file MINIFILESYSTEM in the directory with your code. Write a C program, called mkfs.exe, that creates an empty filesystem on this disk, containing only one directory (the root directory) with no entries.

Guidelines for filesystem design

Disk organization: Use the first block from the disk as the superblock followed by the disk blocks that contain the inodes (about 10% of the disk). Use the rest of the disk for data blocks.

Superblock: The superblock contains global information about the file system. You can store in the superblock things like the first free inode (if the free inodes are organized in a linked list) the first free data block (more about this later), the disk address of the root inode (the entry point in the file-system), the total number of inodes and blocks, the overall size of the file-system and any other thing you consider useful. Also in the first 4 bytes put a predefined number (called a "magic number", for instance, 0x8888) that helps you determine if you have a legitimate file system on a disk or not.

Inodes contain information about the file or directory they represent. Any relevant information about the file except the name should be kept in it's inode. This might include: type (file or directory), size, the disk address of the next free inode (to maintain the list of free inodes), etc. Also the inode contains information about the data blocks used by the file. You have to be able to address 11 direct blocks and one indirect block (that contains pointers to other disk blocks). You can find more information about direct and indirect blocks in your textbook.

Directories Directories are "special" files that contain the mapping between file or directory names and inode numbers (or disk addresses). The only differences between a directory and a file are: directories cannot be deleted with unlink and can be deleted only if they are empty, they have a fixed format (for example you can use an ASCII representation with the inode number and the file on the same line and separated by a tab or a binary representation; just make sure the file name can be at least 256 characters) and they have the type set to DIRECTORY. It's fine if you use a linear search to find the inode number that corresponds to a particular filename in the directory.

Paths have the general form /dir1/dir2/.../dirn/filename. The first / refers to the root of the file system.

How to get started

Make a backup copy of your code from the previous project
Download the files for this project: minithreads3.zip. These comprise the disk device code, minifile interface, and a shell. There is also a replacement Makefile to let you compile filesystem support into minithreads.
Move the new files into the same directory as your minithreads code.

We've included a simple command shell in shell.c, which you can link against your minifile implementation. It should enable you to test your code from the command line.

Submissions

You should follow the generic submission guidelines to submit this project. Make sure you use the correct project number (3).

Extra challenges

Note: These suggestions for an extra challenge will be examined but not graded. They will have no impact on the class grades. They are here to provide some more advanced work for those who are interested.

Handle failure cases, e.g. five threads are concurrently writing to the disk when a system crash occurs (someone presses control-c). The file-system should not be left in an inconsistent state. You can set the failure rates to non-zero values to have the disk controller experience such errors occasionally, just like a real disk. (difficult)
Implement hard links to files. The reason why the Unix delete operation is called "unlink" is because the use of inodes for storing file information, separate from the directory hierarchy, allows a file to have multiple names, even within the same directory. Every "name" simply points to the one inode for the file. To add an additional directory entry for a file (i.e. give it another name), the "link" system call is used. Unlink is the opposite of link: it removes a name. The implementation is complicated by the fact that you need to keep track of how many links exist to a file, so that you know when to remove it completely. (should not be too hard)