Redundant Arrays of Inexpensive Disks (RAID)
Review by James Ezick, February, 1998; updated by Vincent
Ng, March 9, 1999
State of the World (c. 1987)
-
The rate of increase in the number of CPU instructions per
second is far outpacing the increases in I/O transfer rates. This
difference is due in some part to the mechanical nature of the physical
disks. Seek time and latency are still by far the greatest inhibitors
to transfer rate.
-
Amdahl's Law suggests that in the coming years the effective
improvement of increased CPU's will be diminished by an I/O bottleneck.
In essence, the improved CPU performance will be wasted.
-
There exist on the market today a number of brands of physical
disks covering a wide range of performance characteristics. These
include Single Large Expensive Disks (SLED) to serve commercial purposes
and smaller, cheaper disks to serve the growing home PC market.
-
For the cheaper disks, I/O's per second per actuator are
within a factor of two of the larger disks. Further, the smaller
disks, in many cases, are superior in cost per megabyte and power consumption
per megabyte.
-
Standards such as SCSI have allowed companies to embed mainframe
controller functions at low cost into the less expensive disks.
-
These factors would suggest that inexpensive disks could
be used in arrays to emulate the function of larger disks. The inhibiting
factor to this approach is that inexpensive disks provide inferior reliabily,
and any system relying upon banks of them offer a dismal collective level
of fault tolerance.
The Idea
-
The increased failure rate of inexpensive disks could be
tolerated if a system of disks could be constructed in such as way as to
tolerate single disk failures. This is accomplished by supplementing
a bank of disks with additional (check) disks that store redundant or checksum
data. These would allow the recovery of data from a defective disk.
-
This redundancy, however, detracts from I/O performance as
well as the total megabytes per dollar ratio.
-
By varying the level of fault tolerance five distinct levels
of redundancy were defined.
RAID Levels
Level 1
-
Mirrored Disks: All data is backed up completely.
-
50% of the storage capacity is wasted, cost of the system
doubled. All reads / writes occur twice.
Level 2
-
Detect and correct single bit errors.
-
Bit-interleaving the data across all disks of a group.
-
Data must be read / written in entire blocks to allow the checksum calculation.
-
Need enough check disk to identify the disk with the error: Ten data disks
require four check disks. Five data disks require five check disks.
-
Good for supercomputer applications, bad for transaction processing.
Level 3
-
Utilize the disk hardware to detect disk failure, rely on a single check
disk to recover from the error: effect performance per disk increases.
-
Lowest possible reliability overhead.
Level 4
-
Interleave data between disks on a sector level.
-
Improve performance of small transfers through parallelism for reads.
-
A small write uses 2 disks to perform 2 reads and 2 writes; parity calculation
is simpler than that in level 3.
-
Number of writes that can be performed simultaneously is limited to the
number of groups: every write to a group must read and write the check
disk.
Level 5
-
Support multiple individual writes per group.
-
Small read-modify-writes perform close to the speed of the disk.
-
Large transfer performance per disk and high useful storage capacity percentage
similar to levels three and four.
Conclusions
-
Level 5 RAID offers a factor of roughly ten improvement in
performance, reliability and power consumption and a factor of three reduction
in size over a SLED.
-
RAID offers the advantage of modular growth.
-
Lower power consumption makes battery backups feasible.
Questions
-
To what extent are RAID systems scalable?
-
Is the mean time to failure calculation accurate? Would
we expect the failures to more closely approximate a normal distribution?
Does this change the expected gap between failures.
-
Would we expect failures to still occur independently for
the interconnected disks?
-
How do you like the "level" nomenclature?