112 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 31, NO.2, MARCH 2001
Operative Diagnosis of Graph-Based Systems with
Multiple Faults
Stefano Chessa and Paolo Santi
Abstract—The problem of multiple faults diagnosis in
safety-critical systems is considered. Error propagation between
system components is modeled as a directed graph, where the
errors propagate instantaneously along the edges. Some of the
system components are equipped with alarms, which ring when
abnormal conditions are detected. A diagnosis algorithm identifies
the set of potential failure sources based on the set of ringing
alarms. This paper introduces the D-FAULTS algorithm, which
diagnoses the system when at most two nodes can be failure
sources at any time. The concept of sequential diagnosis is also
introduced, to deal with an unknown number of faults. Sequential
diagnosis is aimed at locating the smallest set of nodes containing
at least one fault. Using this approach, a faulty system can be
restored to normal condition by executing repeatedly the diagnosis
and repair phases. To this purpose, we introduce the sequential
diagnosis algorithm S-DIAG with optimal time complexity.
Index Terms—Graph-based systems, multiple faults diagnosis,
operative diagnosis, safety-critical systems, sequential diagnosis.
I. INTRODUCTION
F
AULT diagnosis deals with locating faulty components
in a system, which will be replaced or repaired in order
to restore normal operative conditions. The problem of fault
diagnosis is of primary importance in safety-critical systems,
such as aircraft systems [1], space vehicles [16], process plants
[11], and chemical industries [7], [19], where prompt local-
ization of faults is vital. Among the models proposed for fault
diagnosis in safety-critical systems [3], [4], [17], graph based
models have been widely studied [6], [9], [10], [13], [18], [19].
In this paper we deal with the operative diagnosis problem,
where a set of potential failure sources is identified based on
a set of alarms attached to some of the system components.
More specifically, we consider the graph-based model of
[13], where the system is represented by a directed graph
, called the system graph. Node set represents
system components and edges represent error propagation
between components. Some of the nodes are equipped with
alarms, whose state can be either silent or ringing (when an
abnormal condition is detected). An external observer (either
a human operator or a machine) controls the alarms state. The
diagnosis process, which starts when at least one alarm rings,
returns a set of potential failure sources, denoted , which
Manuscript received June 12, 2000; revised December 23, 2000. This paper
was recommended by Associate Editor G. Biswas.
S. Chessa is with the Dipartimento di Informatica, University of Pisa, 56125
Pisa, Italy (e-mail: chessa@iei.pi.cnr.it).
P. Santi is with the Istituto di Elaborazione dell’Informazione, Area della
Ricerca San Cataldo, 56100 Pisa, Italy.
Publisher Item Identifier S 1083-4427(01)02334-7.
contains all the faulty nodes and possibly some fault-free
nodes. Nodes in are then tested individually in order to
repair/replace all faulty components.
Based on the propagation time of errors, systems are
classified as zero-time or nonzero time. In zero-time systems,
error propagation appears to be instantaneous to the observer.
These systems are modeled assuming zero propagation time
along the edges. Conversely, in nonzero time systems errors
propagate in a time that is significantly slower than the reac-
tion time of the observer. Non-zero time systems are further
divided in two classes. In the first class, propagation times for
every edge are known and time invariant. In the second class,
propagation times are unknown and/or vary with time.
The operative diagnosis problem when an arbitrary number
of nodes is faulty is NP-complete both in zero and nonzero time
systems [13]. However, efficient solutions for the single-fault
diagnosis problem, i.e., diagnosis assuming that at most one
component is faulty, were presented in [13]. More specifically,
a single-fault diagnosis algorithm for zero-time
systems was introduced, where , , and denote the number
of nodes, edges, and alarms, respectively. The complexity can
be reduced to with a preprocessing cost
of . An algorithm and an
algorithm with an preprocessing cost for nonzero time
systems with known propagation times, and an
algorithm for nonzero time systems with unknown propagation
time were also presented. The same author of [13] performed an
expected-value analysis of two single-fault diagnosis algorithms
for zero-time systems [14], and introduced a parallel single-fault
diagnosis algorithm for the same class of systems [15].
A problem related to operative diagnosis is the so-called
alarm placement problem, in which the minimum number of
alarms have to be placed, so that the fault can be uniquely
diagnosed under the hypothesis of single fault. This problem
is NP-complete when the structure of the system graph is un-
restricted [13]. Optimal alarm placement algorithms for three
special classes of graphs (tree structured graphs, single-entry
single-exit series parallel graphs and two level graphs) and a
polynomial-time approximation algorithm for general graphs
were presented in [8].
In this paper the problem of multiple faults diagnosis in
zero-time systems is addressed. First, we assume that at
most two components of the system are faulty (double-fault
diagnosis), and we give an (where )
double-fault diagnosis algorithm. In many cases of practical
interest, the time complexity of the algorithm can be reduced to
assuming a proper implementation of
some set operators. Furthermore, we introduce the sequential
1083–4427/01$10.00 © 2001 IEEE