112 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 31, NO.2, MARCH 2001 Operative Diagnosis of Graph-Based Systems with Multiple Faults Stefano Chessa and Paolo Santi Abstract—The problem of multiple faults diagnosis in safety-critical systems is considered. Error propagation between system components is modeled as a directed graph, where the errors propagate instantaneously along the edges. Some of the system components are equipped with alarms, which ring when abnormal conditions are detected. A diagnosis algorithm identifies the set of potential failure sources based on the set of ringing alarms. This paper introduces the D-FAULTS algorithm, which diagnoses the system when at most two nodes can be failure sources at any time. The concept of sequential diagnosis is also introduced, to deal with an unknown number of faults. Sequential diagnosis is aimed at locating the smallest set of nodes containing at least one fault. Using this approach, a faulty system can be restored to normal condition by executing repeatedly the diagnosis and repair phases. To this purpose, we introduce the sequential diagnosis algorithm S-DIAG with optimal time complexity. Index Terms—Graph-based systems, multiple faults diagnosis, operative diagnosis, safety-critical systems, sequential diagnosis. I. INTRODUCTION F AULT diagnosis deals with locating faulty components in a system, which will be replaced or repaired in order to restore normal operative conditions. The problem of fault diagnosis is of primary importance in safety-critical systems, such as aircraft systems [1], space vehicles [16], process plants [11], and chemical industries [7], [19], where prompt local- ization of faults is vital. Among the models proposed for fault diagnosis in safety-critical systems [3], [4], [17], graph based models have been widely studied [6], [9], [10], [13], [18], [19]. In this paper we deal with the operative diagnosis problem, where a set of potential failure sources is identified based on a set of alarms attached to some of the system components. More specifically, we consider the graph-based model of [13], where the system is represented by a directed graph , called the system graph. Node set represents system components and edges represent error propagation between components. Some of the nodes are equipped with alarms, whose state can be either silent or ringing (when an abnormal condition is detected). An external observer (either a human operator or a machine) controls the alarms state. The diagnosis process, which starts when at least one alarm rings, returns a set of potential failure sources, denoted , which Manuscript received June 12, 2000; revised December 23, 2000. This paper was recommended by Associate Editor G. Biswas. S. Chessa is with the Dipartimento di Informatica, University of Pisa, 56125 Pisa, Italy (e-mail: chessa@iei.pi.cnr.it). P. Santi is with the Istituto di Elaborazione dell’Informazione, Area della Ricerca San Cataldo, 56100 Pisa, Italy. Publisher Item Identifier S 1083-4427(01)02334-7. contains all the faulty nodes and possibly some fault-free nodes. Nodes in are then tested individually in order to repair/replace all faulty components. Based on the propagation time of errors, systems are classified as zero-time or nonzero time. In zero-time systems, error propagation appears to be instantaneous to the observer. These systems are modeled assuming zero propagation time along the edges. Conversely, in nonzero time systems errors propagate in a time that is significantly slower than the reac- tion time of the observer. Non-zero time systems are further divided in two classes. In the first class, propagation times for every edge are known and time invariant. In the second class, propagation times are unknown and/or vary with time. The operative diagnosis problem when an arbitrary number of nodes is faulty is NP-complete both in zero and nonzero time systems [13]. However, efficient solutions for the single-fault diagnosis problem, i.e., diagnosis assuming that at most one component is faulty, were presented in [13]. More specifically, a single-fault diagnosis algorithm for zero-time systems was introduced, where , , and denote the number of nodes, edges, and alarms, respectively. The complexity can be reduced to with a preprocessing cost of . An algorithm and an algorithm with an preprocessing cost for nonzero time systems with known propagation times, and an algorithm for nonzero time systems with unknown propagation time were also presented. The same author of [13] performed an expected-value analysis of two single-fault diagnosis algorithms for zero-time systems [14], and introduced a parallel single-fault diagnosis algorithm for the same class of systems [15]. A problem related to operative diagnosis is the so-called alarm placement problem, in which the minimum number of alarms have to be placed, so that the fault can be uniquely diagnosed under the hypothesis of single fault. This problem is NP-complete when the structure of the system graph is un- restricted [13]. Optimal alarm placement algorithms for three special classes of graphs (tree structured graphs, single-entry single-exit series parallel graphs and two level graphs) and a polynomial-time approximation algorithm for general graphs were presented in [8]. In this paper the problem of multiple faults diagnosis in zero-time systems is addressed. First, we assume that at most two components of the system are faulty (double-fault diagnosis), and we give an (where ) double-fault diagnosis algorithm. In many cases of practical interest, the time complexity of the algorithm can be reduced to assuming a proper implementation of some set operators. Furthermore, we introduce the sequential 1083–4427/01$10.00 © 2001 IEEE