Lecture 19: Filesystem consistency

Free space

A filesystem needs to keep of the unused blocks so that it can allocate new blocks when creating new files or expanding existing files. There are a number of strategies for keeping track of the free space:

Filesystem consistency

A filesystem consists of two data structures: the directory tree and the free list. These data structures must be kept in sync; we must maintain the invariant that every sector on the disk appears either in the free list or in the inode structure, but not both. In addition, each sector should appear once (with the exception of inodes, which should appear as many times as their reference counts indicate).

To maintain this invariant, block allocation and deallocation will usually require at least two steps: one to modify the free list, one to modify the directory structure.

However, a power failure can occur between these operations; because disks are persistent this can leave the filesystem in an inconsistent state:

Normally we would maintain invariants by breaking them only inside a critical section: for example we could acquire a lock before modifying the free list and the file system, and only releasing it afterwards. In this context, this is difficult: to implement the lock, we'd need to force the universe to acquire the lock before kicking out our power cord

We could try to come up with a complex protocol ensuring that no matter when we are interrupted, we are in a consistent state. However, we may be thwarted because the disk is allowed to reorder the writes. If the correctness of a protocol relies on the order in which writes take place, we must sync the disk between the writes: wait for the disk to acknowlege that the first write has been stored before beginning the second. This is an expensive operation.

Maintaining consistency

Uninterruptible power supply

One solution is to use an uninterruptible power supply (UPS) : a battery with a mechanism for raising an interrupt if the power is about to fail. With a UPS, we can avoid starting any writes if we know the power is about to go out.

This is our trick for forcing the universe to take a lock before killing the power.

Recovery on reboot

An alternative solution is to check the filesystem for consistency when we reboot the machine. We can traverse the entire file system and free list to build a table telling us whether each sector occurs in exactly one of the two.

If we detect an orphaned sector, we can simply add it to the free list. A sector that appears twice in the directory structure could be duplicated or simply removed (some systems actually move duplicated blocks into a special "lost-and-found" directory, allowing the user to examine them and recover them manually if necessary).

In unix, the fsck tool (named for what it does: filesystem check, and also for what you say when you have to run it) is used to check and recover a filesystem.

Journaling

As disks (and thus file systems) got larger, the process of traversing an entire filesystem became prohibitively expensive. To solve this, we can use journaling:

When recovering, only the blocks that are part of incompleted operations in the journal need to be inspected for inconsistency.