Reliability Metrics for Distributed Systems
Traditional metrics are hard to apply in multi-component systems:
? In a big network, at a given moment something will be giving trouble, but very few users will see it.
? A system that has excellent average reliability may give terrible service to certain users.
? There are so many components that system administrators rely on automatic reporting systems to identify problem areas.