ANALYSIS OF CONDITIONAL MTTF OF FAULT- TOLERANT SYSTEMS HOON CHOI Department of Computer Engineering, Chungnam National University, 220 Kung-dong, Taejon 305- 764, South Korea WEI WANG 212 Federal City Road, Lawrenceville, NJ 08648, U.S.A. and KISHOR S. TRIVEDI Department of Electrical Engineering, Duke University, Box 90291, Durham, NC 27708-0291, U.S.A. (Received for publication 19 May 1997) AbstractÐMean time to failure (MTTF) is one of the most frequently used dependability measures in practice. By convention, MTTF is the expected time for a system to reach any one of the failure states. For some systems, however, the mean time to absorb to a subset of the failure states is of interest. Therefore, the concept of conditional MTTF may well be useful. In this paper, we formalize the de®- nition of conditional MTTF and cumulative conditional MTTF with an ecient computation method in a ®nite state space Markov model. Analysis of a fault-tolerant disk array system and a fault-tolerant software structure are given to illustrate application of the conditional MTTF. # 1998 Published by Elsevier Science Ltd. All rights reserved. INTRODUCTION In many practical situations, there are multiple causes of system failure. For instance, failure of a computer system may be due to the failure of pro- cessor or memory or disk subsystem. In a disk sub- system, a failure may result in data destruction which is accompanied by a warning. Alternatively, data may be corrupted without being detected [1]. The lat- ter event is more serious than the former, so we would like the probability of the latter event to be small and the mean time to the occurrence of the lat- ter event to be rather long. In fault-tolerant software such as a recovery block [2, 3], the acceptance of a module's erroneous outputs by the acceptance test is more disastrous than a recovery block failure due to exhaustion of alternate modules. Transient probability of system failure is easily broken down into its constituent causes and has been reported by many existing reliability modeling tools. While these probabilities can be used to estimate the system's susceptibility to various failure causes, the mean time to failure (MTTF) due to dierent causes provides more practical information. There have been some studies on MTTF [4±7]. The MTTF from a given initial state is one example, and the mean residual life at time t [4], which is the expected time to failure given that the system has been operational up to time t, is another. Heidelberger et al. [5] describe an ecient numerical method for computing MTTF in a Markovian dependability model. In Johnson et al. [6], the con- ditional expectation is de®ned to be the expected time to failure given that the failure occurs within a speci®c time window. All these studies, however, deal with the MTTF to the group of all failure states. The MTTF due to certain failure causes, or the MTTF to a given subset of failure states, is a relatively unex- plored topic. In Ciardo et al. [8], it is observed that the probability of absorption to a partition of absorbing states from a given initial state may be computed as the accumulated reward until absorp- tion by assigning zero reward rate to the states in the partition and positive reward rate to all the other absorbing states. The possibility of computing the ex- pectation and distribution of time given that the pro- cess is absorbed in a state with zero reward is also speculated. In Choi and Trivedi [9], we de®ne the notion of the MTTF to a subset of failure states that is named the conditional MTTF and show its sol- ution method in a Markov dependability model. Arlat and Laprie [10] de®ne the term mean safe time or mean time before catastrophic failures for high safety systems. They assume that the system is brought back to operation after a benign (safe) fail- ure and the time to reset the system after such a fail- Microelectron. Reliab., Vol. 38, No. 3, pp. 393±401, 1998 # 1998 Published by Elsevier Science Ltd All rights reserved. Printed in Great Britain 0026-2714/98 $19.00 + 0.00 PII: S0026-2714(97)00043-7 393