A HIDDEN MARKOV MODEL BASED ALGORITHM FOR ONLINE FAULT DIAGNOSIS WITH PARTIAL AND IMPERFECT TESTS' Jie Ying, T. Kirubarajan, Krishna R. Pattipati Department of Electrical zyxwvut & System Engineering U-157 University of Connecticut Stows, CT 06269-3157 krishna@sol.uconn.edu Somnath Deb Qualtech Systems, Inc. 6 Storrs Road, Suite 6 Willimantic, CT 06266 860-486-2890 (860) 423-3659 zyxw Abstract - In this paper, we present a Hidden Markov Model (HMM) based algorithm for online fault diagnosis in complex large-scale systems with partial and imper- fect tests. The HMM-based algorithm handles tests un- certainties and inaccuracies, finds the best estimate of system states and identifies the dynamic changes in system states, such as from a fault-free state to a faulty one. We also present two methods to estimate the model parameters, namely, the state transition proba- bilities and the instantaneous probabilities of observed test outcomes, zyxwvutsrqpo for adaptive fault diagnosis. In order to validate the adaptive parameter estimation techniques, we present simulation results with and without the knowledge of HMM perameters. In addition, the advan- tages of using the HMM approach over a Hamming- distance based fault diagnosis technique are quantified. Tradeoffsin complexity versus performance of the diag- nostic algorithm are discussed. zyxwvuts INTRODUCTION Recent advances in sensor technology, communica- tions and computational capabilities have made online system health monitoring more feasible than it was in the past. By means of smart sensors on-board a system, low-level decisions are made and processed based on sensed waveforms. These are then fused by a real-time monitoring and inferencing system. How- ever, low level decisions may be in error due to improper threshold selection, electromagnetic interfer- ence, environmental conditions, etc. Imperfect tests introduce an additional element of uncertainty into the diagnostic process. The "PASS outcome of a test does not guarantee the integrity of components under test because the test may have missed a fault. On the other hand, a "FAIL" outcome of a test does not mean that one or more of the implicated components are faulty because the test outcome may have been a false alarm. Consequently, the diagnostic procedures must hedge against this uncertainty in test outcomes. In addition, at any sampling time, the results of all tests in a system are not available due to varying sampling rates of sensors and signal processing limitations. In this paper, we consider the problem of determining the states of components given a set of partial and un- reliable tests. Figure 1 describes the overall architec- ture of the integrated monitoring and diagnostic process. Analog sensor data are sent to signal process module. After signal features are checked in terms of trend, range and threshold, a "PASS or "FAIL" test outcome is fed to online diagnosis system, which contains a set of various inference engines, such as HMM, TEAMS-RT [12], Maximum Likelihood, Hamming distance, etc. In our previous work, Shakeri, Raghavan and Pattipati proposed diagnosis algorithm [I], based on Langran- gian relaxation and subgradient optimization. In this method, it is assumed that the state of the system is static but unknown [11[2], i.e., all the states of the underlying Markov chain are absorbing. Consequently, this method does not exploit the inherent dynamics of system state evolution. Smyth [3] applied hidden Markov models for fault detection in dynamic systems. In this work, the system ' Researchsupported in part by NASA - Ames Research Center and Qualtech Systems. Inc. zyxw 0-7803-5432-x/99/$10.00 0 1999 IEEE 355