Sam Toueg

Sam Toueg

Professor
sam@cs.cornell.edu
http://www.cs.cornell.edu/home/sam/sam.html

PhD Princeton University, 1979

My research is in distributed computing. In particular, I work on methodologies, paradigms, and algorithms for highly-available and secure distributed systems. My long-term goal is to help bridge the gap between theoretical results and the need for efficient and practical solutions.

My recent research effort, in collaboration with M. Aguilera and W. Chen, is on the use of unreliable failure detectors for designing reliable distributed systems. We studied the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first proposed new failure detectors that are particularly suitable to the crash-recovery model. We next determined under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we gave two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice —those with no failures or failure detector mistakes.

University Activities

Director: Master of Engineering Program, Computer Science Department

Publications

Heartbeat: a timeout-free failure detector for quiescent reliable communication. Proc. 11th International Workshop on Distributed Algorithms (Sept. 1997) Saarbrucken, Germany, Lecture Notes in Computer Science, Springer-Verlag, 126-140 (with M. Aguilera and W. Chen).

Fault-tolerant wait-free shared objects. J. ACM (May 1998) (with T. Chandra and P. Jayanti).