c 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Achieving Elementary Cycle Synchronization between Masters in the Flexible Time-Triggered Replicated Star for Ethernet Alberto Ballesteros 1 , Juli´ an Proenza 1 , David Gessner 1 , Guillermo Rodriguez-Navas 2 , Thilo Sauter 3 1 DMI - Universitat de les Illes Balears, Spain 2 alardalen University, V¨ aster˚ as, Sweden 3 Center for Integrated Sensor Systems, Danube University Krems, Austria {a.ballesteros, julian.proenza}@uib.es, davidges@gmail.com Abstract—For a distributed embedded system (DES) to oper- ate continuously in a dynamic environment, it must be flexible and highly reliable. This applies in particular to its communi- cation subsystem. The Flexible Time-Triggered Replicated Star for Ethernet (FTTRS) aims at providing such a subsystem by means of a highly-reliable switched-Ethernet architecture based on the Flexible Time-Triggered paradigm (FTT), a master/slave communication paradigm where the master periodically polls the slaves using so-called trigger messages (TMs). In particular, FTTRS interconnects nodes by redundant communication paths provided by two switches, each embedding an FTT master that manages the communication. This allows FTTRS to tolerate the failure of one switch without interrupting the communication as long as the masters are replica determinate, i.e., provide identical service to the slaves. The master replica determinism entails the masters broadcasting their TMs in a lockstep fashion: when one master broadcasts a TM, the other should do the same quasi- simultaneously. In this paper we present a solution inspired by the Precision Time Protocol (PTP) for achieving this lockstep transmission and preliminary results showing the precision with which we can synchronize the masters on a software prototype. I. I NTRODUCTION There is a growing interest in operating distributed embed- ded systems (DES) in dynamic environments, usually for long periods of time and also with high reliability. This poses new challenges on the design of the communication subsystems for such DES, which have to be flexible but still dependable. The Hard Real-Time Ethernet Switching (HaRTES) [1] has recently been suggested as a suitable infrastructure to support flexible and reliable communication for distributed embedded systems. HaRTES is an implementation of the Flexible Time-Triggered (FTT) paradigm [2] over Switched- Ethernet, and therefore handles real-time communication by means of a centralized master-multislave polling mechanism. Its main feature is that the FTT master is embedded within the switch itself, allowing enhanced error-detection and error- handling capabilities [1]. However, there is one limitation of the FTT paradigm that HaRTES inherits: the master constitutes a single point of failure because any failure of the master to deliver its service will cause communication to cease. In the FT4FTT project (which stands for Fault Tolerance for Flexible Time-Triggered Ethernet-based systems) we pro- pose an extension of HaRTES that eliminates this single point of failure by means of a replicated star [3] [4] [5]. The Switch 1 (master 1) Switch 2 (master 2) Slave A Slave B Slave C slave link interlinks Fig. 1. FTTRS architecture. resulting communication infrastructure is called Flexible Time- Triggered Replicated Star (FTTRS) and is composed of two HaRTES-based switches interconnected via several redundant links known as interlinks. As depicted in Fig. 1, slaves connect to both switches using dedicated slave links. The channel replication of FTTRS naturally provides tol- erance to failures of the communication links. How to handle this redundancy has been first described in [3]. Later, some mechanisms for handling master replication, based on semi- active replication, were suggested in [4]. However, they were not totally defined and were not validated experimentally. This paper defines the full solution for semi-active master replication in FTTRS and shows some experimental results after a first software prototype implementation. The problems to be solved by our master replication scheme can be described in terms of the master functionality. In any network following the FTT approach, time is divided into time slots of fixed duration called Elementary Cycles (ECs). Every EC is divided into three different windows: Trigger Message, Synchronous and Asynchronous window. The Trigger Message window (TMW) is used by the master to construct and issue the Trigger Message (TM). This message notifies the slaves that a new EC has started and contains the list of messages that must be sent during the Sync window. The challenge is therefore to ensure that, upon master failure, the other master can take over immediately without disturbing the on-going communication. Since the main function of the master is to send the TM, this requires that both masters are replica determinate with respect to the TM. That is, they must be able to provide the same TM at approximately the same instant. In this way, master replacement can be made totally transparent for the slaves. This paper describes the two mechanisms that have been implemented in order to fulfill this property: (a) a novel fault-tolerant protocol for EC