Master Failures in the Precision Time Protocol
Georg Gaderer
1
, Stefano Rinaldi
2
, and Nikolaus Kerö
3
1
Research Unit for Integrated Sensor Systems, Austrian Academy of Sciences
2
University of Brescia
3
Oregano Systems Design and Consulting GmbH
Abstract – If all clocks within a distributed system share the same
notion of time, the application domain can gain several advantages.
Among those is the possibility to implement real-time behavior,
accurate time stamping, and event detection. However, with the
wide spread application of clock synchronization another topic has
to be taken into consideration: the fault tolerance. The well known
clock synchronization protocol IEEE1588 (Precision Time Protocol,
PTP), is based on a master/slave principle, which has one severe
disadvantage. This disadvantage is the fact that the failure of a
master automatically requires the re-election of a new master. The
start of a master election based on timeout and thus takes a certain
time span during which the clocks are not synchronized and thus
running freely. Moreover the usage of a new master also requires
new delay measurements, which prolong the time of uncertainty as
well.
This paper analyzes the results of such a master failure and
proposes democratic master groups instead of hot-stand-by masters
to overcome this problem by. It is shown by means of simulation that
the proposed solution will not deteriorate the accuracy of the slave
clocks in case of a master failure.
Keywords – Fault tolerance, Clock Synchronization, Computer
Networks, IEEE Keywords
INTRODUCTION
Clocks
1
representing the same notion of time have many
advantages in distributed systems, the most obvious being the
possibility to set coordinated actions such as synchronized
communication. This can be used to establish real-time,
which compulsory for TDMA schemes. Another application
for synchronized clocks is the identification, ordering, and
quantization in terms of timing of events in a distributed
system. Again, the applications for this approach are wide
spread; one very famous are the LAN eXtensions for
Instrumentation (LXI) [1,2], where test and measurement
devices are synchronized over Ethernet in order to
conveniently setup a-posteriori triggering.
Approaches to reach these synchronicity requirements are
well observed and take usually advantage of communication
networks. Synchronization is done by periodically exchanged
messages to align the clock w. r. t. each other. Synchronizing
clocks this way is often used, as for example in the internet-
standard Network Time Protocol (NTP), or in the more
accurate IEEE1588 [2] Precision Time Protocol (PTP)
standard.
1
The work presented in this paper is partly funded by the
European Fund for Regional Development (EFRE)
PTP is based on a master/slave principle in a way that
once a master, which has been previously elected
synchronizes its slaves via multicast messages
2
. However, for
a considerable number of applications even a temporary
failure of the clock synchronization is by no means
acceptable. The PTP protocol handles recovery from a failure
by means of providing the so called best master clock (BMC)
algorithm; during this phase all slaves within a
synchronization (or multicast) domain remain with free
running unsynchronized clocks, yet electing a new master.
This paper proposes an approach, where multiple masters are
tied together to a so-called mastergroup, where one or more
masters may fail without any of the nodes noticing the
failure, thus the synchronization accuracy will no be
deteriorated.
The remainder of this paper is structured as follows: After
an analysis of the state of the art, namely the master election
process in IEEE1588, the approach to synchronize within the
group is elaborated and the proposed system shown within a
simulation experiment. Finally a conclusion will round up the
paper and give an outlook for future research.
STATE OF THE ART
State of the art clock synchronization techniques can use
two different paradigms: the master/slaved based principle
and the democratic approach. The first method elects one
dedicated master in order to synchronize all other nodes. In
opposite to that, democratic algorithms use the clocks of
several nodes, which are then combined to an agreed clock
value. Obviously both approaches have advantages and
disadvantages; master/slave based clock synchronization is
easy to implement and to debug, the dependencies within an
environment running such a protocol are simple. On the other
hand, democratic approaches are adding a certain degree of
complexity to a system but have the advantage that they can
offer fault tolerance. Faulty or malfunctioning clocks can be
sorted out without sacrificing even short-term accuracy.
A. IEEE 1588
The basics of IEEE1588-2002 and version 2008 are well
specified of course in the respective standard document [2].
However, secondary literature is available as well, giving an
overview [3]. As this paper focuses on the fault case that a
2
Version 2008 of the IEEE1588 standard (approved at the time of
submission of this paper) allows synchronization via unicast messages as
well.
ISPCS 2008 – International IEEE Symposium on Precision Clock
Synchronization for Measurement, Control and Communication
Ann Arbor, Michigan, September 22–26, 2008
978-1-4244-2275-3/08/$25.00 ©2008 IEEE 59