The Design and Implementation of a Log-Structured File System
Motivation
- Main-memory caching makes reads fast
- Unix FFS writes to are slow and don't use bandwidth effectively
- Write to a log, and write whole buffer to log sequentially --
higher throughput to disk
- achieving faster crash recovery
Basic operation
- Unix semantics for files
- Write file blocks and inode to log together
- Since position of inodes changes, keep an inode map, fits
in memory
Log management
- Writes to log render previous versions of blocks inactive
- Log requires large amounts of contiguous free space
- Divide disk into 1-MB segments and reclaim inactive blocks via
cleaning (copying) of segments
- Want lots of empty segments, lots of (almost-) full segments, few
segments "in between"
- Collect "cold" (rarely-modified) data into distinct segments so
that these will not need to be cleaned
- Benefit-cost ratio to determine which segments to clean
Crash recovery
- Uses checkpointing with two checkpoint regions in fixed
positions on disk
- Write file data, indirect blocks, inodes, inode map and segment
usage table to disk
- Addresses of blocks in inode map and segment usage table are
written to checkpoint region, followed by timestamp
- On crash, roll-forward from older checkpoint region
(compare timestamps)
- Recovery restores consistency of inode map and segment usage table
with log
- Consistency between directories and inodes solved by directory
operation log records
Performance
- Increases utilisation of disk bandwidth from 5-10% (Berkeley Fast
File System) to 70%
- Not as good for random writes followed by sequential reads, since
storage is not sequential
- Checkpoint interval of 1 hour gives 1 second recovery time