Supporting systems of systems hazard analysis using multi-agent simulation Rob Alexander ⇑ , Tim Kelly Department of Computer Science, University of York, York, United Kingdom article info Article history: Received 12 July 2011 Received in revised form 15 June 2012 Accepted 29 July 2012 Available online 3 September 2012 Keywords: Safety Simulation System of systems Hazard analysis Multi-agent abstract When engineers create a safety-critical system, they need to perform an adequate hazard analysis. For Systems of Systems (SoSs), however, hazard analysis is difficult because of the complexity of SoS and the environments they inhabit. Traditional hazard analysis techniques often rely upon static models of component interaction and have difficulties exploring the effects of multiple coincident failures. They cannot be relied on, therefore, to provide adequate hazard analysis of SoS. This paper presents a hazard analysis technique (SimHAZAN) that uses multi-agent modelling and simulation to explore the effects of deviant node behaviour within a SoS. It defines a systematic process for developing multi-agent models of SoS, starting from existing models in the MODAF architecture framework and proceeding to imple- mented simulation models. It then describes a process for running these simulations in an exploratory way, bounded by estimated probability. This process generates extensive logs of simulated events; in order to extract the causes of accidents from these logs, this paper presents a tool-supported analysis technique that uses machine learning and agent behaviour tracing. The approach is evaluated by compar- ison to some explicit requirements for SoS hazard analysis, and by applying it to a case study. Based on the case study, it appears that SimHAZAN has the potential to reveal hazards that are difficult to discover when using traditional techniques. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction A growing challenge for safety engineers is maintaining the safety of large-scale military and transport Systems of Systems (SoSs), such as Air Traffic Control (ATC) networks and military units with Network Enabled Capability (NEC). The term ‘‘SoS’’ can be de- fined in terms of key characteristics (Alexander et al., 2004): SoS consist of multiple components that are systems in their own right, each having their own goals and some degree of autonomy but needing to communicate and collaborate in order to achieve over- all SoS goals. SoS are typically distributed over large areas (such as regions, countries or entire continents), and their components fre- quently interact with each other in an ad-hoc fashion. It follows that military and transport SoS have the potential to cause large- scale destruction and injury. This is particularly true for SoS incor- porating new kinds of autonomous component systems, such as Unmanned Aerial Vehicles (UAVs). This paper is concerned with one aspect of the safety process for SoS, specifically hazard analysis: determining the distinct causal chains by which the behaviour of the SoS can lead to an accident. Hazard analysis is a crucial part of any risk-based safety approach, but the defining characteristics of SoS make it very difficult. Recent developments in SoS are likely to worsen the SoS safety problem. For example, there is a move towards dynamic reconfig- uration, which greatly expands the number of system states that needs to be considered; any analysis may need to be carried out for all possible configurations. Similarly, SoS increasingly use ad hoc communications, meaning that information errors can propa- gate through the system by many, unpredictable, routes. These factors overwhelm the ability of manual hazard analysis and therefore suggest a need for automated hazard analysis. There are a few automated approaches specifically designed for SoS safety, but what exists typically lacks any kind of systematic modelling process or has a very limited applicability in terms of the models it can analyse, and requires models that are built specifically for that analysis (for example, many approaches based on model-checking). Most of the extant SoS-specific methods are aimed at safety risk assessment (deriving quantitative values for the risk posed by the SoS); few of them are focussed specifically on hazard identification and hazard analysis (discovering the different hazards in the SoS and the distinct combinations of causes that can lead to them). This paper presents SimHAZAN: a partly-automated hazard anal- ysis method for SoS that avoids some of the problems associated with existing techniques. In particular, it has a systematic modelling process and a separate analysis approach that can be applied either to models developed through that process or to models developed 0925-7535/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ssci.2012.07.006 ⇑ Corresponding author. Address: Department of Computer Science, University of York, Deramore Lane, York YO10 5GH, United Kingdom. Tel.: +44 1904 325 474, +44 7813 134 388. E-mail addresses: rob.alexander@york.ac.uk (R. Alexander), tim.kelly@cs.york. ac.uk (T. Kelly). Safety Science 51 (2013) 302–318 Contents lists available at SciVerse ScienceDirect Safety Science journal homepage: www.elsevier.com/locate/ssci