Modeling the reliability of a group membership protocol for dual-scheduled time
division multiple access networks
☆
Valério Rosset
a, 1
, Pedro F. Souto
b,
⁎, Paulo Portugal
a
, Francisco Vasques
c
a
Departmento de Engenharia Electrotécnica e de Computadores, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal
b
Departmento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal
c
Departmento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal
abstract article info
Article history:
Received 27 May 2009
Accepted 12 October 2011
Available online 20 October 2011
Keywords:
Distributed computing
Reliability
Fault-tolerance
FlexRay
Group membership
We present reliability models for a group membership protocol designed for TDMA networks such as FlexRay,
a protocol that is likely to become the de facto standard for next generation automotive networks. The models
are based on discrete-time Markov chains and consider a comprehensive set of fault scenarios. Furthermore,
they are parametric allowing for a sensitivity analysis. The results, obtained by a numeric solution of the
models using the PRISM model-checker, show that they are computationally practical for realistic configura-
tions and that the GMP can achieve reliability levels in the range required for safety critical applications.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
Fieldbus-based communication systems play a prominent role in
several application domains, ranging from manufacturing and process
industries to automotive and avionics systems. They represent the
backbone of modern networked control systems, providing a commu-
nication infrastructure that supports control, supervision, monitoring
and diagnosis applications [1].
Many of these application domains impose stringent dependabili-
ty requirements since a failure of the control system may have impor-
tant consequences on the system under control or its environment,
such as economic losses or severe human injury. Typical examples
are safety-critical applications for automotive x-by-wire systems [2]
and industrial machinery [3]. Therefore, in these systems it is neces-
sary to include fault tolerance mechanisms in order to prevent possi-
ble faults from causing a system failure.
Recently, a new class of time division multiple access (TDMA) con-
trol protocols, which we call dual scheduled TDMA (DuST), has
emerged as an important solution to provide fault-tolerant
communications for safety-critical applications. Some examples [4]
of these solutions are: TT-CAN [5], FTT-CAN [6,7] and, most impor-
tantly, FlexRay [8], which is expected to become the de facto standard
for next generation network for automotive applications.
It is widely accepted [9] that services such as group membership
and distributed agreement make the systematic development of
safety-critical applications easier. Because these services are used as
building blocks of safety-critical applications, it is important to ensure
their correctness, ideally through a formal proof. However, any proof
relies on fault assumptions and it ensures that the service behaves
correctly only as long as the fault assumptions hold true.
Given that applications in the automotive domain are mission-
oriented, the reliability of these services is the most important
dependability attribute. In this paper, we present models for the eval-
uation of the reliability of a group membership protocol (GMP) [10]
by equating its reliability to the reliability of the assumptions made
in its proof, i.e. the probability of these assumptions being true.
Because the GMP is executed repeatedly and its fault assumptions
are stated on a per-execution basis, we present discrete-time Markov
chains (DTMC) models whose time step is closely related to the GMP
execution.
The fault models considered are rather comprehensive and
include both permanent and transient faults affecting both nodes
and communication channels. In addition to standard message loss
that affects a single message and is equally perceived by all good
nodes, we consider two other types of faults: 1) strictly omission
asymmetric communication faults [11] in which some good nodes
Computer Standards & Interfaces 34 (2012) 281–291
☆ This work was partially supported by the Portuguese Fundação para a Ciência
under Project PTDC/EIA74313/2006 and by Scholarship BD 19302/2004.
⁎ Corresponding author. Tel.: + 351 22 5081855; fax: + 351 22 557 4103.
E-mail addresses: vrosset@unifesp.br (V. Rosset), pfs@fe.up.pt (P.F. Souto),
pportugal@fe.up.pt (P. Portugal), vasques@fe.up.pt (F. Vasques).
1
This author current affiliation is Instituto de Ciência e Tecnologia, Universidade
Federal de São Paulo (UNIFESP), Rua Talim, 330 - Vila Nair, São José dos Campos, SP,
Brasil.
0920-5489/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.csi.2011.10.004
Contents lists available at SciVerse ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi