The Design and Architecture of the Microsoft Cluster Service
-- A Practical Approach to High-Availability and Scalability

Werner Vogels, Dan Dumitriu, Ken Birman
Dept. of Computer Science, Cornell University

Rod Gamache, Mike Massa, Rob Short, John Vert
Microsoft Cluster group, Microsoft Corporation

Joe Barrera, Jim Gray
Scalable Server group, Microsoft Research.


Microsoft Cluster Service (MSCS) extends the Windows NT operating system to support high-availability services. The goal is to offer an execution environment where off-the-shelf server applications can be continuously available, even in the presence of node failures. Later versions of MSCS will provide scalability via a node and application management system which allows applications to scale to hundreds of nodes. In this paper we provide a detailed description of the MSCS architecture and the design decisions that have driven the implementation of the service. The paper also describes how some major applications use the MSCS features, and describes features added to make it easier to implement and manage fault-tolerant applications on MSCS.

Copyright 1998 IEEE. Published in the Proceedings of FTCS'98, June 23-25, 1998 in Munich, Germany. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331/ Piscataway, NJ 08855-1331, USA. Telephone: + Intl.908-562-3966.

