Consensus Based on Failure Detectors with a Perpetual Accuracy Property Achour MOSTEFAOUI and Michel RAYNAL IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France Tel: (+33) 2 99 84 71 88 Fax: (+33) 2 99 84 25 33 mostefaoui,raynal @irisa.fr Abstract This paper is on the Consensus problem, in the context of asyn- chronous distributed systems made of processes, at most of them may crash. A family of failure detector classes satisfying a Perpetual Accuracy property is first defined. This family includes the failure detector class (the class of Strong failure detectors defined by Chandra and Toueg) central to the definition of a class ( ) where x is the minimum number ( ) of correct processes that can never be suspected to have crashed. Then, a protocol that solves the Consensus problem is given. This protocol works with any failure detector class ( ) of this family. It is particularly simple and uses a Reliable Broadcast protocol as a skeleton. It requires communication steps, and its communication bit complexity is (where is the maximal size of an initial value a process can propose). Keywords: Asynchronous Distributed System, Consen- sus, Crash Failure, Perpetual Accuracy Property, Reliable Broadcast, Unreliable Failure Detector. 1 Introduction The Consensus problem is now recognized as a funda- mental problem when one has to design or implement reli- able asynchronous distributed systems in presence of pro- cess crashes. Informally, the Consensus problem can be de- fined in the following way: each process proposes a value and all non-crashed processes have to agree on a common value, which has to be one of the proposed values. It has been shown that practical agreement problems can be re- duced to the Consensus problem. As an example, let us consider the Atomic Broadcast problem: all processes have to agree on the same message delivery order. This is a typ- ical agreement problem that can be solved by reducing it to the Consensus problem [1]. Unfortunately, solving the Consensus problem in an asynchronous distributed system where processes may crash is not a trivial task. It has been proved by Fischer, Lynch and Paterson that the Consensus problem has no de- terministic solution in those systems as soon as processes (even only one) may crash [2]. The intuition that underlies this impossibility result lies in the inherent difficulty of safe- ly distinguishing a process that has crashed from a process that is “very slow”, or from a process with which commu- nications are “very slow”. To circumvent this impossibil- ity result, in a seminal work [1], Chandra and Toueg have introduced the Unreliable Failure Detector concept, and s- tudied how unreliable failure detectors can be used to solve the Consensus problem in asynchronous distributed systems with process crash failures. A failure detector can be seen as an oracle that provides each process with a list of processes it suspects to have crashed. A failure detector can make mistakes by not sus- pecting a crashed process or by erroneously suspecting a non-crashed process. Chandra and Toueg have studied sev- eral classes of unreliable failure detectors. A class is defined by a Completeness property and an Accuracy property. The Completeness property is on the actual detection of crashes. The aim of an Accuracy property is to restrict the mistakes a failure detector module can make. Furthermore, an Accu- racy property can be Eventual or Perpetual. An Accuracy property is Eventual when it is allowed to be satisfied on- ly after some time by the failure detector. It is Perpetual when it has to be satisfied from the beginning by the failure detector [1]. In this paper, we are interested in solving the Consen- sus problem in asynchronous distributed systems prone to process crashes augmented with unreliable failure detectors satisfying a Perpetual Accuracy property. More precisely, we consider a family of unreliable failure detector classes whose Perpetual Accuracy property is parameterized by the minimum number (x) of correct processes that can not be suspected to have crashed. (The failure detector class de- noted [1] belongs to this family. It corresponds to the case where all but one process may be suspected). The pro- posed Consensus protocol is particularly simple, namely, it