Techniques for understanding concurrent programs are becoming increasingly important as distributed computing systems become widespread. Our research has focused on the development of these techniques.
We have been heavily involved in applying assertional reasoning to the design of concurrent, distributed, fault-tolerant, and real-time programs. One result of this research is the fail-stop processor abstraction, a methodology for its use, and designs for prototype implementations. We have also developed proof rules for a variety of message-passing primitives and have developed an approach for proving temporal properties of concurrent programs that does not employ temporal logic. Most recently, we have been investigating the use of refinement mappings in implementing distributed and fault-tolerant services. Just as a loop invariant can guide the design of a loop, we have found that a refinement mapping can be used to guide the design of a (distributed) service. Creativity is required to identify candidate refinement mappings for a given service specification, but it is then straightforward to design the protocols to support that service.
In addition to developing techniques for reasoning about programs, we have also been investigating useful paradigms that underlie protocols for distributed systems. Our early work concentrated on synchronization and the state machine approach (so-called active replication) for constructing reliable, decentralized control regimes. More recently, we have been trying to understand the primary-backup (or primary-copy) approach to implementing such services. We have derived lower bounds for various key cost metrics for primary-backup style implementations. We have also constructed primary-backup protocols that are optimal in these cost metrics. The hope is to understand the trade-offs in using various approaches for achieving fault tolerance.