Lecture 16: Disks and RAID

Disk

We discussed disk layout and different algorithms for scheduling the disk head - fifo - sstf (shortest seek time first) - look (elevator algorithm) - scan (fairer elevator algorithm) - c-look (circular look) - c-scan (circular scan)

See the lecture slides from spring 2017 for details.

RAID overview

RAID stands for Redundant Array of Inexpensive Disks (industry prefers to read the I as Independent).

Contrast to SLED: Single Large Expensive Disk. - RAID is cheaper (economy of scale: small/cheap disks are common) - RAID has better failure characteristics (if your SLED fails, you're out of luck)

With RAID, many disks are connected to a RAID Controller, a hardware device that manages the disks and presents a single disk inteface to the operating system.

RAID arrays can be organized in many different ways to improve performance and reliability, below we discuss the key ideas behind standard RAID levels.

Striping (RAID 0)

With striping, blocks of the filesystem are spread across the disks:

D1 D2 D3 D4
B1 B2 B3 B4
B5 B6 B7 B8
B9 B10 B11 B12
... ... ... ...

Advantages: - sequential read and write throughput very high: can use all N disk heads simultaneously - stores N disks worth of data

Disadvantages: - increased risk of failure (if 1/N disks fails, the entire filesystem is lost) - if single disk failure chance is 10%, then chance of failure for 4 disks is ~35%

Mirroring (RAID 1)

Mirroring gives redundancy by copying all data to all disks:

D1 D2 D3 D4
B1 B1 B1 B1
B2 B2 B2 B2
B3 B3 B3 B3
... ... ... ...

Advantages: - good read throughput (can read from all disks) - great failure tolerance (can recover from (N-1)/N failures, and continue to service requests during recovery) - if single disk failure chance is 10%, chance of failure for 4 disks is 0.01%

Disadvantages: - expensive: can only store 1 disk's worth of data - bad write throughput: writes as slow as slowest disk

Parity (RAID 2-5)

Parity can be used instead of hamming codes to handle single known errors (by known errors, we mean that the controller is notified when a disk fails). The parity p of bits b0 b1 b2 ... bn is simply the exclusive or of all of the bits. In general,

p = b0 + b1 + ... + bn 0 = p + p = b0 + b1 + ... + bn + p bi = 0 + bi = b0 + b1 + ... + b(i-1) + (bi + bi) + b(i+1) + ... + bn + p = b0 + b1 + ... + b(i-1) + 0 + b(i+1) + ... + bn + p

here + denotes exclusive or.

RAID 2 uses bit-level striping with parity. RAID 3 uses byte-level striping with a dedicated parity disk. RAID 4 uses block-level striping with a dedicated parity disk.

D1 D2 D3 D4
B1 B2 B3 P1-3
B4 B5 B6 P4-6
B7 B8 B9 P7-9
B10 B11 B12 P10-12
... ... ... ...

RAID 4 requires every write to access the parity disk, which can cause more wear on the parity disk. RAID 5 stripes the parity across all of the disks:

D1 D2 D3 D4
B1 B2 B3 P1-3
B4 B5 P4-6 B6
B7 P7-9 B8 B9
P10-12 B10 B11 B12
B13 B14 B15 P13-15
B17 B18 P17-19 B19
... ... ... ...

Advantages of RAID 5:

Disadvantages of RAID 5:

Reed-Solomon encoding (RAID 6)

As disks become large, the recovery time takes longer and longer, increasing the probability of two simultaneous failures. Two simultaneous failures will completely destroy a RAID 5 array.

Reed-Solomon codes generalize parity to allow two or more parity bits to be computed. With two bits of parity, one can correct two known failures. This allows two simultaneous failures to be handled.

RAID 6 uses two striped parity blocks.

Nested RAID

One can also "nest" different RAID levels by using a separate RAID controller instead of a disk inside of a RAID controller. That is, multiple RAID 0 controllers can be plugged into a single RAID 1 controller, providing some of the benefits of striping and some of the benefits of mirroring. This arrangement is called RAID 0+1 or RAID 01.

Similarly, multiple RAID 1 arrays can be placed in a single RAID 0 array. This is called RAID 1+0 or RAID 10.

Advantages and disadvantages of these RAID levels are left as an exercise.