The Design and Implementation of a Log-Structured File System

Motivation

Main-memory caching makes reads fast
Unix FFS writes to are slow and don't use bandwidth effectively
Write to a log, and write whole buffer to log sequentially -- higher throughput to disk
achieving faster crash recovery

Basic operation

Log management

Writes to log render previous versions of blocks inactive
Log requires large amounts of contiguous free space
Divide disk into 1-MB segments and reclaim inactive blocks via cleaning (copying) of segments
Want lots of empty segments, lots of (almost-) full segments, few segments "in between"
Collect "cold" (rarely-modified) data into distinct segments so that these will not need to be cleaned
Benefit-cost ratio to determine which segments to clean

Crash recovery

Uses checkpointing with two checkpoint regions in fixed positions on disk
Write file data, indirect blocks, inodes, inode map and segment usage table to disk
Addresses of blocks in inode map and segment usage table are written to checkpoint region, followed by timestamp
On crash, roll-forward from older checkpoint region (compare timestamps)
Recovery restores consistency of inode map and segment usage table with log
Consistency between directories and inodes solved by directory operation log records

Performance

Increases utilisation of disk bandwidth from 5-10% (Berkeley Fast File System) to 70%
Not as good for random writes followed by sequential reads, since storage is not sequential
Checkpoint interval of 1 hour gives 1 second recovery time