You work for a filesystem company at which a mishap has occurred. A programmer named Denbo Haxor was tasked with implementing a log-structured filesystem (LFS). Denbo Haxor was 10 days into the implementation of LFS when he went out to watch a meteor shower, where he was struck and vaporized by a falling meteorite. Your task is to pick up the partially finished code and fill in any missing gaps to yield a fully-functioning filesystem.
We know that Denbo's implementation was patterned after the canonical LFS implementation described in this research paper:
In log-structured file systems, all new data is written on the disk sequentially in a log-like structure. The log structured disk is divided into segments which consist of blocks. Each segment comprises a superblock and a set of data blocks. The superblock keeps track of which data blocks are free, while the data blocks in the segment hold the file and directory contents, as well as Inodes that point to and organize these data blocks.
The main function of Inodes is to point to the blocks the file/directory data spans. Each Inode can point to approximately 100 data blocks directly. When the number of data blocks is greater than an inode can directly point, they are stored in a datablock (called an indirect block) and a pointer to this indirect block is stored in the Inode.
Unlike the traditional Unix filesystem, Inodes in LFS do not have fixed positions on disk. The latest version of an Inode is written to the next available block in the current segment (if the current segment is full, it is flushed to disk, a new segment from the disk is loaded into memory and becomes the current segment). This approach causes the Inodes to be spread throughout the log, hence an additional data structure, called an InodeMap, is necessary to keep track of the latest position of any given Inode in use. The InodeMap keeps the mapping from virtual inode numbers to physical block addresses for every Inode.
The InodeMap is a critical data structure as it is required to locate any Inode (whose virtual Inode number is obtained by walking the directory hierarchy) for any given file or directory. It is flushed to the disk with a sync() command, which writes the InodeMap to the disk, along with a special Inode that points to all the blocks that house the InodeMap. The physical location of this special Inode is written to the superblock of the segment in which that Inode was written, along with the generation count of the InodeMap. Every flush increments the generation count, making it possible to identify the latest InodeMap on disk following a crash. On filesystem mount, the filesystem goes through every segment and checks every superblock to find and recover the latest InodeMap.
Writes and reads in LFS are very similar to writes and reads in a traditional Unix filesystem, except the position of the Inodes is not fixed. When a new file is being written, LFS first creates an in-memory Inode, writes the data blocks to the next available blocks in the segment, updates the in-memory Inode to point to the data blocks, and writes the Inode to the segment. It then modifies the parent directory block contents to include a directory entry (a name, inodeid tuple spanning 32 bytes, 28 of which hold the file name and 4 of which hold the Inode identifier). It writes the modified data blocks of the parent directory to the segment, followed by the modified Inode for the parent directory. Note that modifications do not have to propagate up the directory hierarchy, as the parent's virtual Inode number remains unaffected, and thus the parent's parent directory contents do not have to be changed.
Locating a desired file on disk is also analogous to the way this operation is performed on a traditional Unix filesystem. Suppose the user is trying to open file "/aaa/bbb/c". A search begins with the Inode for /. Each directory entry in / is examined to see if it matches "aaa". If there is a match, the virtual inode number is recovered from the directory entry. This is translated, through the InodeMap, into a physical address for that Inode. The process continues down the path for "bbb", and so on.
The aforementioned description comes from design documents found on Denbo's desk; there may well be some deviation between this description and the actual implementation. In all such cases, actual code trumps any English description of what is supposed to happen, and your task is to fix up the code to affect the intended action.
Denbo's LFS implementation is in Python, and comes with a bare bones shell. It can be invoked with "python Shell.py". The shell (should) support(s) the following operations:
Your primary task is to understand the operation of the current LFS implementation, find and fix all bugs, and add any required functionality to yield a full-fledged log-structured filesystem.
From what we could understand from the incomplete code, this subtask can be broken down into several sub-tasks, and the missing parts are identified with XXX's in the code:
The Denbo LFS has a very minimalistic feel, similar to the early Unix filesystem. You'll find that the filesystem makes no distinction between files and directories; it is possible to treat a directory as if it were a file. This makes reading the directory simple, as no special code is necessary. But it also makes it possible to write over a directory and mess up its contents. So don't do that, or if you do, live with the consequences!
Software artifacts of this level of complexity are sufficiently difficult to make manual grading infeasible. Your solutions will be graded against a battery of automatic tests. Such automated evaluation leaves no wiggle-room -- if the system crashes for any reason, no matter how small, it's as if it doesn't work at all. So be sure to test your code thoroughly, and turn in only a coherent set of files that fulfill the requisite functionality.
Please turn in your code along with a readme.txt file that describes how much of the required and extra-credit tasks you completed. Also turn in all code you used to test your implementation.
Start by running Denbo's code and creating a new disk by typing "mkfs". You can then try to create a file named /a of size 10, with "create a 10". You will see that this will fail, as the file creation code will look for the root directory to see if a already exists, fail to find the root, and thus return an error. You may want to turn on the DEBUG variables in Disk.py and Segment.py to get some insight into what is happening inside the filesystem. You will then probably want to start in the shell, and insert print statements extensively to see how the code is navigating through the different components. Once you figure out a way to make searchdir and write_new_block to work (and there are many such ways), you should have a mostly functioning system.
While the amount of code you need to write to complete this assignment is not much, keep in mind that writing those lines will require, like most realistic tasks in the real world, reverse engineering and understanding an existing, intricate code base. This may seem frustrating at first; start by inserting print statements along a particular code path (e.g. file create), and expand from there. If you feel so inclined, you can delete as much of the existing code as you like and rewrite it from scratch; however, we strongly advise against this, as matching the same functionality will take a lot of time.
We are always here to help. If you have any questions, contact the course staff at cs4410staff at systems.cs.cornell.edu