Contents lists available at ScienceDirect Safety Science journal homepage: www.elsevier.com/locate/safety Derailment accident risk assessment based on ensemble classication method Samira Kaeeni, Madjid Khalilian , Javad Mohammadzadeh Department of Computer Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran ARTICLE INFO Keywords: Data mining Safety risk assessment Ensemble classication ABSTRACT Safety plays important roles in railway transportation industry. Plan and development of safety system requires sucient awareness on specic situations which creates unsafe conditions in railway network. Derailment ac- cident is known as one of the most critical train accident. It is necessary that safety ocials of this industry by taking advantage of the experiences of the past accidents prevent repeating it in the future. Using up-to-date tools and techniques can create dierent view from what has been presented by railway safety ocial. In this study, a derailment accident risk assessment classication model has been proposed, which may be used for safety system in railway network. Our model uses the cumulated data on the Iranian Railway accidents database. Three popular data mining techniques are used to our proposed model in two steps. In the rst step, Articial Neural Networks, Naïve Bays, and Decision Tree are utilized independently to predict the derailment accident risk, and each method produces the model of their prediction as a form of probabilities. In the second step, outcome for each model receives a weight based on its predicting accuracy by using genetic algorithm (GA), and makes the nal decision for derailment accident risk assessment. To validate model eciency, it was used for a sample in the Islamic republic of Iran Railway. In the end, it's shown this model presented high-quality in- formation for predicting accident and GA (Genetic Algorithm) in second step has a signicant role in perfor- mance improvements. 1. Introduction Nowadays, railways are one of the best choices for transportation to reduce pollution and avoid trac congestion. On the other hand, it includes many advantages such as safety, economy, fast, and a regular transportation to reach a destination for both passengers and com- modities. As a result, passengers and shipping organizations prefer railway to other transportation modes. Therefore, to hold this belief and achieve a reasonable advantage, railway transportation administrators should work responsively to raise the level of safety and reduce issues that cause accidents. Developing a safety system requires an intelligent system for pre- dicting an accident based on previous data of accidents on the railway network and current conditions of vehicles. It can be performed by using a model for safety risk assessment. It is of extreme importance to analyze previous data in order to extract a model based on hidden knowledge among huge amounts of data. There are many well-estab- lished data mining techniques for data analysis, particularly for nu- meric data. Eective analysis of data from a database helps model creation and support safety management strategies, by estimating ac- cident risk (Mirabadi and Sharian, 2010). For these purposes, data mining is used for knowledge extraction from data. Knowledge can be dened by interesting patterns but the term interestingis ambiguous. Based on the literature, non-trivial, previously unknown, implicit and potentially useful are characteristics for interesting patterns which is extracted from the data. Knowledge extraction is the process of gathering data from dierent sources (e.g. databases), data preproces- sing (cleaning, integration, transformation, etc.), statistical summary, knowledge discovery (data mining), and eventually using extracted knowledge for decision making. Data mining has two main function- alities: descriptive and predictive data mining. In classication and prediction, the model(s) is constructed to describe or distinguish classes or concepts for future prediction (Han et al., 2011). For example, it is possible to classify railway accidents into dierent groups of accidents based on their features. The main objective of this study is to discover meaningful patterns and trends among derailment accidentsdata of the Iranian Railways (RAI). Derailment is the most important source of rail accident in Iran. One of the most common elds of transportation to apply data mining is accident analysis. Very little is known about the usefulness of applying data mining in railway accident analysis, although there are numerous applications of data mining in road accident analysis. One main reason for that may be the limited number of accidents happening on railway networks compared to those on roads. One signicant category of https://doi.org/10.1016/j.ssci.2017.11.006 Received 3 February 2017; Received in revised form 23 September 2017; Accepted 5 November 2017 Corresponding author. E-mail address: khalilian@kiau.ac.ir (M. Khalilian). Safety Science xxx (xxxx) xxx–xxx 0925-7535/ © 2017 Elsevier Ltd. All rights reserved. Please cite this article as: Kaeeni, S., Safety Science (2017), http://dx.doi.org/10.1016/j.ssci.2017.11.006