IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007 249 Automating Scenario Analysis of Human and System Reliability Alistair G. Sutcliffe, Member,IEEE, and Andreas Gregoriades Abstract—The system reliability analyzer tool for analyzing the reliability of system designs is described and its use illustrated in a system engineering case study of a naval command and control system. The performance of systems consisting of human operators and technology components is assessed by Bayesian nets, which calculate error probabilities from inputs of agent properties and environmental conditions. The tool tests scenarios representing the system design and its operational behavior, which is modeled as cycles of command and control tasks. The tool indicates weak points in the scenario sequence and assesses the reliability of one or more system designs with a set of operational scenarios and a variety of environmental conditions. Index Terms—Human factors, system reliability, system re- quirements and specifications. I. I NTRODUCTION P REVIOUS approaches to assessing human reliability have employed fault/event trees to diagnose potential failure points in system operation (e.g., technique for human error rate prediction (THERP) [44]). Performance-shaping factors [43] have been used to estimate the probable effect of hu- man variables such as operator fatigue on system failure [45]. However, several authors have called for a more systematic approach to estimating human error based on sound models of psychology [12], [33], [46]. Taxonomies of errors such as slips and mistakes [33], phenotypes and genotypes [13], and types of slips and lapses [27] provide a more principled approach, which have been used to calculate the influence of factors such as task load, stress, and training on the probability of slip and mistake errors in system designs with differing complexities [45]. However, a large number of events can potentially cause failures, and event tree analysis methods can only address a small number of causes that are described a priori in the event/fault tree. Furthermore, hierarchical models hinder the analysis of multiple interactions between events and system states that frequently lead to accidents. As Reason [33], [34] notes, failure has diverse and multiple potential causes arising from the social/organizational environment, poor maintenance Manuscript received November 12, 2004; revised June 3, 2005, August 16, 2005, and September 27, 2005. This work was supported in part by EPSRC Systems Integration in Major Projects (SIMP). This paper was recommended by Associate Editor J. K. Kuchar. A. G. Sutcliffe is with the School of Informatics, University of Manchester, M60 1QD Manchester, U.K. (e-mail: a.g.sutcliffe@man.ac.uk). A. Gregoriades is with the Surrey Defence Technology Centre, The School of Management, University of Surrey, GU2 7XH Surrey, U.K. (e-mail: a.gregoriades@surrey.ac.uk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2006.886375 of equipment, adverse operating environments, human error, and poor design. To improve design and prevent, or at least reduce, the potential for system failure, analysis methods that estimate the probable influence of multiple factors on system failure are required. To address this problem, probabilistic models that combine environmental and psychological influences on human errors have been developed [16], [45]. However, these models cannot account for different types of initiating event that may lead to system failure. While it is impossible to anticipate all possible hazardous events that a system may encounter, there has been increasing interest in using scenarios as test probes for safety analysis [1], [10]. Scenarios are narratives that describe usage or operation of a system either drawn from experience of acci- dents or imagined future situations for system operation. They usually contain an event sequence with contextual information that allows the analyst to interpret the likelihood of a design being successful or failing. Scenarios can be used as test data to challenge designs and their implicit assumptions by positing obstacles that might prevent a safe system operation from being achieved and hence refine system requirements to create defenses against failure [30], [31], [40]. In a previous work, we created checklists that probed for potential causes of failure, de- veloped from Hollnagel’s taxonomy [13] and Reason’s theory of human error [33], which enabled scenarios to be “walked through” to evaluate the likelihood of failure in event sequences [9]. This approach was partially automated by using a pathway expansion algorithm that detected branch points in an event sequence and then provided test questions to probe alternative paths [42]. Unfortunately, this produced too many alternatives and questions that made the analysis too time consuming. Bayesian nets (BNs) have been applied to reason about reliabilities based on the properties of products and devel- opment processes in several domains ranging from military vehicles to software [2], [6], [8], [26]. However, to date, BNs have only been applied to assessing the reliability of designs based on component properties. No account has been taken of the operational events the system has to respond to. In this paper, we describe a software tool that takes the automation of reliability analysis one step further by reasoning not only about properties of the design but also how the design interacts with the environment and the events it has to respond to in one or more operational scenarios. This paper is organized in three sections. First, the system reliability analyzer (SRA) architecture and the BNs are briefly described. This is followed by a case study in which the tool is applied. Finally, this paper describes the lessons learned from the validation experiences to date and discusses future development of our approach. 1083-4427/$25.00 © 2007 IEEE