Efficient Analysis of Fault Trees with Voting Gates Jianwen Xiang 1 , Kazuo Yanoo 1 , Yoshiharu Maeno 1 , Kumiko Tadano 1 , Fumio Machida 1 , Atsushi Kobayashi 2 , and Takao Osa Service Platforms Research Labs. 1 , Service Platform Systems Development Division 2 NEC Corporation, Kawasaki, 211-8666 Japan Email: {j-xiang@ah, k-yanoo@ab, y-maeno@aj, k-tadano@bq, f-machida@ab, a-kobayashi@bu, t-osaki@bc}.jp.nec.com Abstract—The voting (k-out-of-n, k/n) gate is a standard logic gate of fault trees for modelling fault tolerant systems. Traditionally, a voting gate is expanded into a formula with AND and OR gates in the analysis of fault trees. The expanding may result in combinatorial explosion problem in the calculation of minimal cut sets (MCSs) of the fault tree even for not very big n, especially when the inputs of the voting gate are not basic but intermediate events. In this paper, we propose a set of reduction rules to simplify the voting gates without direct expanding. A concept of minimal cut vote (MCV) is proposed to denote a k/n gate whose inputs are all basic events and whose k- combinations are all MCSs of the fault tree. With the proposed reduction rules and the concept of MCV , efficient evaluation and weeding of MCSs of fault trees can be achieved, and the result can be represented in a more compact form. Experiments on some practical fault trees with voting gates and comparisons with some commercial and academic fault tree analysis tools have been carried out, and the results show that our method not only outperforms conventional MCS evaluation methods by several orders of magnitude, but also provides competitive performance compared with binary decision tree (BDD) based algorithms. I. I NTRODUCTION Fault tree analysis (FTA) [?] is a traditional reliability analysis technique. It is basically a deductive procedure for determining the various combinations of basic component failures that could result in the occurrence of a specific undesired top event at the system level. Standard Boolean logic constructs, such as AND, OR, and Voting (k-out-of-n, k/n) gates, are used to decompose the fault events and construct the fault trees. A k/n gate is a gate with n input events in which the output event occurs if k or more than k of the input events occur. The voting gates are widely used to model fault tolerant systems, such as clusters. One of the main purposes of FTA is to find all the smallest combination (logic product) of basic events which will result in the top event, namely minimal cut sets (MCSs). Both qualitative and quantitative analysis then can be carried out based on the MCSs, such as evaluation of critical components (e.g., single point of failures, SPOFs) and calculation of system unreliability in terms of traditional inclusion-exclusion method. In the calculation of MCSs with traditional top-down method, a k/n gate is typically expanded as a sum of products of its input events. Direct enumerating of the products such as the IRRAS algorithm [?] will result in O( ( n k ) ) for both time and space complexities, which could be very costly when k n/2 and n is not too small. A recursive decomposition method is proposed in [?] as shown in Eq. 1. Although the time complexity can be reduced to O(k · n), the space complexity is still a factorial problem, i.e., O( ( n k ) ) in terms of the number of products of the final result. k n (e 1 ,...,en)= e 1 · k-1 n-1 (e 2 ,...,en)+ k n-1 (e 2 ,...,en) (1) The expanding of k/n gates has several drawbacks. First, it can easily result in space explosion problem in practice, espe- cially in large-scale systems where a cluster usually consists of dozens or hundreds of (redundant) components. The situation could be even much worse when the inputs of the k/n gates are not basic events but intermediate events such as nested voting gates and OR gates, which are typically introduced by the (vertical and horizontal) functional dependencies between different components [?]. For instance, assume each input of a k/n gate consists of an OR gate with m events, the space complexity could be further expanded to O(m k · ( n k ) ). This problem has been identified by some practical fault trees with reasonable size of k/n gates in our case studies, and we will present such an example fault tree of a 2-tier web system in Section V. A side effect of expanding is that it may introduce “specious” repeated events, which in return will increase the cost for minimization of cut sets. In an extreme case, given a fault tree consisting of only one voting gate whose inputs are distinct basic events, the expansion will transform the fault tree originally without repeated events into a tree with a set of repeated events, and thus unnecessary minimization may have to be applied to the cut sets afterwards. Moreover, the expanding may cause troubles for the quan- titative analysis of the fault trees. For instance, assume that a fault tree consists of only one k/n gate, the time complexity for the calculation of system failure rate could become O(2 ( n k ) ) in terms of traditional inclusion-exclusion method. Although by combinatorial analysis, the probability of the output event of a k/n gate can be calculated in O n j=k ( n j ) ) [?], but this requires the combinatorial structure is kept rather than expanded. If the k/n gates are expanded, it is generally difficult to derive some combinatorial structures from a potentially large number of MCSs. The problem becomes more complex if the inputs of k/n gates include repeated events. In such cases, it would be very difficult to derive (reaggregation) some new voting gates from the MCSs for combinatorial analysis, since some products included by the original k/n gates may be removed