designing reliable distributed systems. We studied the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first proposed new failure detectors that are particularly suitable to the
PhD Princeton University, 1979
My research is in distributed computing. In particular, I work on methodologies, paradigms, and algorithms for
highly-available and secure distributed systems. My long-term goal is to help bridge the gap between theoretical results and the need for efficient and practical solutions.
My recent research effort, in collaboration with M. Aguilera and W. Chen, is on the use of unreliable failure detectors for
crash-recovery model. We next determined under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we gave two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice —those with no failures or failure detector mistakes.
Director: Master of Engineering Program,
Computer Science Department
- Randomization and failure detection: A
hybrid approach to solve Consensus.
Journal of Computing 28, 3 (June 1999), 890-903 (with M.
- Using the Heartbeat failure detector for quiescent reliable communication and consensus in partionable networks. Invited paper in
Theoretical Computer Science 220, special issue on Distributed Algorithms, 1 (June 1999), 3-30 (with M. Aguilera and W. Chen).