On the Cost of Fault-Tolerant Consensus When There Are No Faults – A Tutorial Idit Keidar Sergio Rajsbaum December 17, 2013 Abstract We consider the consensus problem in asynchronous models enriched with unreliable failure detectors or partial synchrony, where processes can crash or links may fail by losing messages. We study the number of communication steps performed by deterministic consensus algorithms for these models in failure-free executions. We show a tight lower bound of two communication steps. In the process of showing this bound, we give a simple unified proof of a number of different impossibility and lower bound results. Thus, we shed light on the relationship among different lower bounds, and at the same time, illustrate a general technique for obtaining simple and elegant lower bound and impossibility proofs. We illustrate the matching upper bound by describing previously published algorithms that achieve the lower bound. 1 Introduction Consensus is a fundamental problem in distributed computing theory and practice alike. A consen- sus service allows a collection of processes to agree upon a common value. More specifically, each process has an input, and each correct process must decide on an output, such that all correct pro- cesses decide on the same output, and furthermore, this output is the input of one of the processes. Consensus is an important building block for fault-tolerant distributed systems [Lam96]: to achieve fault-tolerance data is replicated, and consensus can be used to guarantee replica consistency using the state machine approach [Lam78, Sch90]. Consensus is not solvable in pure asynchronous models where even one process can crash [FLP85] 1 . However, real systems are not completely asynchronous. Some partially synchronous models [DLS88, CF99] better model real systems. We consider a particularly realistic kind of model [DLS88] which allows the system to be asynchronous for an unbounded but finite period of time, as long as it eventually becomes synchronous, and less than a majority of the processes can crash. Consensus is solvable in this model. Similarly, consensus is solvable in an asynchronous model enriched with certain oracle failure detectors [CT96], including unreliable failure detectors that can provide arbi- trary output for an arbitrary period of time, but eventually provide some useful semantics. In this * Preliminary version appeared in SIGACT News 32(2), Distributed Computing column, pages 45–63, June 2001. MIT Lab for Computer Science. 545 Technology Square, NE43-367, Cambridge, MA 02139, U.S.A. E-mail: idish@theory.lcs.mit.edu. Compaq Cambridge Research Laboratory, One Cambridge Center, Cambridge, MA 02142-1612, U.S.A. E-mail: Sergio.Rajsbaum@compaq.com, rajsbaum@math.unam.mx. On leave from Instituto de Matem´ aticas, UNAM. 1 Consensus can be solved in the asynchronous model by randomized algorithms, and in some shared memory models. In this paper, we consider only message-passing, deterministic algorithms. 1