Lecture 21 part 1: RAID

RAID
- RAID, SLED, RAID controller
striping (raid 0)
mirroring (raid 1)
error correction
- hamming codes (raid 2)
- byte-level parity (raid 3)
- block-level parity (raid 4)
striped parity (raid 5)
multiple "parity" disks: Reed-Solomon encoding (raid 6)
nested RAID (0+1, 1+0)

RAID overview

RAID stands for Redundant Array of Inexpensive Disks (industry prefers to read the I as Independent).

Contrast to SLED: Single Large Expensive Disk. - RAID is cheaper (economy of scale: small/cheap disks are common) - RAID has better failure characteristics (if your SLED fails, you're out of luck)

With RAID, many disks are connected to a RAID Controller, a hardware device that manages the disks and presents a single disk inteface to the operating system.

RAID arrays can be organized in many different ways to improve performance and reliability, below we discuss the key ideas behind standard RAID levels.

Striping (RAID 0)

With striping, blocks of the filesystem are spread across the disks:

D1	D2	D3	D4
B1	B2	B3	B4
B5	B6	B7	B8
B9	B10	B11	B12
...	...	...	...

Advantages: - sequential read and write throughput very high: can use all N disk heads simultaneously - stores N disks worth of data

Disadvantages: - increased risk of failure (if 1/N disks fails, the entire filesystem is lost) - if single disk failure chance is 10%, then chance of failure for 4 disks is ~35%

Mirroring (RAID 1)

Mirroring gives redundancy by copying all data to all disks:

D1	D2	D3	D4
B1	B1	B1	B1
B2	B2	B2	B2
B3	B3	B3	B3
...	...	...	...

Advantages: - good read throughput (can read from all disks) - great failure tolerance (can recover from (N-1)/N failures, and continue to service requests during recovery) - if single disk failure chance is 10%, chance of failure for 4 disks is 0.01%

Disadvantages: - expensive: can only store 1 disk's worth of data - bad write throughput: writes as slow as slowest disk

Error correction (RAID 2-5)

If you don't know when a disk has failed, you can use Hamming codes to detect and repair failures. A (7,4) hamming code stores 4 bits of data using 7 bits, and can detect up to two failures (two of the seven bits being flipped) and can repair one failure (if only one bit is flipped, the flipped bit can be identified and thus un-flipped)

RAID 2 uses bit-level striping with Hamming codes: an array of 7 disks can hold 4 disks worth of data.

Parity can be used instead of hamming codes to handle single known errors (by known errors, we mean that the controller is notified when a disk fails). The parity p of bits b0 b1 b2 ... bn is simply the exclusive or of all of the bits. In general,

p  = b0 + b1 + ... + bn
0  = p + p  = b0 + b1 + ... + bn + p
bi = 0 + bi = b0 + b1 + ... + b(i-1) + (bi + bi) + b(i+1) + ... + bn + p
            = b0 + b1 + ... + b(i-1) +     0     + b(i+1) + ... + bn + p

here + denotes exclusive or.

RAID 3 uses byte-level striping with a dedicated parity disk. RAID 4 uses block-level striping with a dedicated parity disk.

D1	D2	D3	D4
B1	B2	B3	P1-3
B4	B5	B6	P4-6
B7	B8	B9	P7-9
B10	B11	B12	P10-12
...	...	...	...

RAID 4 requires every write to access the parity disk, which can cause more wear on the parity disk. RAID 5 stripes the parity across all of the disks:

D1	D2	D3	D4
B1	B2	B3	P1-3
B4	B5	P4-6	B6
B7	P7-9	B8	B9
P10-12	B10	B11	B12
B13	B14	B15	P13-15
B17	B18	P17-19	B19
...	...	...	...

Advantages of RAID 5:

Good read throughput (N-1) times the single disk throughput
Reasonably good failure tolerance (tolerates 1 of N failures). If single-disk failure rate is 10%, then RAID 5 failure rate is 1%.
Good overhead: N disks can hold (N-1) disks worth of data.

Disadvantages of RAID 5:

Bad write throughput (write requires reading entire stripe, computing parity, and performing two writes)

Reed-Solomon encoding (RAID 6)

As disks become large, the recovery time takes longer and longer, increasing the probability of two simultaneous failures. Two simultaneous failures will completely destroy a RAID 5 array.

Reed-Solomon codes generalize parity to allow two or more parity bits to be computed. With two bits of parity, one can correct two known failures. This allows two simultaneous failures to be handled.

RAID 6 uses two striped parity blocks.

Nested RAID

One can also "nest" different RAID levels by using a separate RAID controller instead of a disk inside of a RAID controller. That is, multiple RAID 0 controllers can be plugged into a single RAID 1 controller, providing some of the benefits of striping and some of the benefits of mirroring. This arrangement is called RAID 0+1 or RAID 01.

Similarly, multiple RAID 1 arrays can be placed in a single RAID 0 array. This is called RAID 1+0 or RAID 10.

Advantages and disadvantages of these RAID levels are left as an exercise.