Sam Toueg
Professor
sam@cs.cornell.edu
http://www.cs.cornell.edu/home/sam/sam.html
Ph.D. Princeton University, 1979


My research interests include distributed computing, fault-tolerance and real-time. I work on methodologies, paradigms, and algorithms for fault-tolerant distributed systems, in both message-passing and shared-memory systems. My long-term goal is to help bridge the gap between theoretical results and efficient, practical solutions.

Last year, we focused our research effort on three problems: reaching Consensus despite process failures, tolerating link failures, and the complexity of wait-free implementations.

A fundamental result of fault-tolerant distributed computing states that Consensus cannot be solved in asynchronous systems with failures. Since this negative result was shown, researchers looked for ways to "circumvent'' it. First among these was the randomized algorithm of Ben-Or that solves Consensus with probability 1. A different approach uses unreliable failure detectors: Tushar Chandra and I proved that Consensus can be solved using failure detectors that can make an infinite number of mistakes. Recently, Marcos Aguilera and I developed a hybrid Consensus algorithm that combines advantages from both approaches: it guarantees deterministic termination if the failure detector is reasonably accurate and probabilistic termination otherwise.

In collaboration with Anindya Basu and Bernadette Charron-Bost, we studied the effect of link failures on the solvability of problems in asynchronous distributed systems. Given a problem that can be solved in a system where the only possible failures are process crashes, is the problem still solvable if links can also fail by losing messages? For various types of link failures, we showed that the answer depends on the maximum number of process failures and the nature of the problem to be solved.

A concurrent system consists of processes communicating via shared objects. An implementation of a shared object is wait-free if every process that accesses this object is guaranteed to get a response even if all the other processes crash. In collaboration with Prasad Jayanti and King Tan, we developed a technique to obtain a linear lower bound on the time and space complexity of several deterministic wait-free implementations and on the space complexity of some randomized wait-free implementations.


University Activities

Publications


Return to:
1995-1996 Annual Report Home Page
Departmental Home Page

If you have questions or comments please contact: www@cs.cornell.edu.


Last modified: 2 November 1996 by Denise Moore (denise@cs.cornell.edu).