Algorithmica (1997) 18: 486–511
Algorithmica
© 1997 Springer-Verlag New York Inc.
Wait-Free Clock Synchronization
1
S. Dolev
2
and J. L. Welch
2
Abstract. Multiprocessor computer systems are becoming increasingly important as vehicles for solving
computationally expensive problems. Synchronization among the processors is achieved with a variety of clock
configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to
the requirements and failure patterns of shared memory multiprocessors. Algorithms in this class can tolerate
any number of napping processors, where a napping processor can fail by repeatedly ceasing operation for an
arbitrary time interval and then resume operation without necessarily recognizing that a fault has occurred.
These algorithms guarantee that, for some fixed k , once a processor P has been working correctly for at least k
time, then as long as P continues to work correctly, (1) P does not adjust its clock, and (2) P ’s clock agrees with
the clock of every other processor that has also been working correctly for at least k time. Because a working
processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these
algorithms are called wait-free. Another useful type of fault-tolerance is called self-stabilization: starting with
an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly
performs its task.
Two wait-free clock synchronization algorithms are presented for a model with global clock pulses. The
first one is self-stabilizing; the second one is not but it converges more quickly than the first one. The self-
stabilizing algorithm requires each processor’s communication register contents to be a part of the processor’s
state. This last requirement is proven necessary. A wait-free clock synchronization algorithm is also presented
for a model with local clock pulses. This algorithm is not self-stabilizing.
Key Words. Distributed computing, Algorithms, Wait-free, Self-stabilization, Clock synchronization.
1. Introduction. Multiprocessor computers are being designed with ever-increasing
numbers of processors. These multiprocessors can be used to solve problems that demand
high computation power, such as grand challenge computing problems, which previously
were not efficiently solvable. However, in order to take full advantage of multiprocessors,
it is vital that they be made fault-tolerant. Fault-tolerance is necessary in order to provide
even the same level of availability that is provided by uniprocessors, since the probability
of a crash in a multiprocessor system increases with the number of processors. Clever
fault-tolerance schemes may also be able to provide a higher level of availability, by
continuing ongoing computations even if a large number of processors fail.
A central issue for any multiprocessor system is the synchronization among proces-
sors. The common synchronization component used in multiprocessors is a clock. There
are several ways to implement a clock in multiprocessor systems: (1) provide a com-
mon clock that is connected to all the processors in the system, (2) provide a common
1
This work was supported by NSF Presidential Young Investigator Award CCR-9396098 and Texas A&M
University Engineering Excellence funds. A preliminary version of this work was presented at the 12th ACM
Symposium on Principles of Distributed Computing, August 1993 [DW].
2
Department of Computer Science, Texas A&M University, College Station, TX 77843, USA. {shlomi,
welch}@cs.tamu.edu.
Received December 20, 1993; revised January 1995. Communicated by G. N. Frederickson.