Contents lists available at ScienceDirect Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie Decision support system for safety improvement: An approach using multiple correspondence analysis, t-SNE algorithm and K-means clustering Krantiraditya Dhalmahapatra, Rohan Shingade, Harshawardhan Mahajan, Abhishek Verma, J. Maiti Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur 721302, India ARTICLEINFO Keywords: Safety analytics Near miss incidents R 2 -profle Perceptual mapping Kernel category Chi-square distance ABSTRACT An attempt has been made to develop a decision support system (DSS) for safety improvement using a multi-step knowledge discovery process involving multiple correspondence analysis (MCA), t-SNE algorithm and K-means clustering. MCA is used for dimension reduction and perceptual mapping from categorical data. Usually, the frst two dimensions are used for perceptual mapping if these two dimensions explain a signifcant percentage of variance. Otherwise, the traditional method of two dimensional mapping, leads to loss of important categorical information involved with other dimensions. Considering the above, a novel R 2 -profle approach, as an alternate to inertia based approach, is adopted to obtain the desired number of dimensions to be retained without loss of signifcant amount ofinformation.t-SNEtechniquereducesthehighdimensionaldataintotwodimensional(2D) map, which provides the associations amongst diferent categories. K-means clustering grouped the 2D cate- gories in homogenous clusters as per the similarities of the categories. A novel kernel category based chi-square distance method is proposed to identify sub-clusters within a cluster which subsequently provides useful rules for safety improvement. The methodology also provides a logical approach of dimension reduction in a form called ‘funnel diagram’. Finally, the DSS is applied to analysing near miss incidents occurred in electric overhead traveling (EOT) crane operations in a steel plant. Several safety rules are identifed and safety interventions are proposed. 1. Introduction Decision support system (DSS) comprises data management, where data relevant to specifc problem are stored; model management, which converts stored data to useful information for decision making, and user interface for providing timely recommendations from the analysed in- formation (Druzdzel & Flynn, 2002). Application of DSS is found to be very efective in road safety (Seneviratne, 1991; Dell’Acqua, De Luca, & Mauro, 2011; Herland, Möller, & Schandersson, 2007; Chassiakos, Panagolia, & Theodorakopoulos, 2005) where objective is to identify possible causes of accidents and suggest cost efective countermeasures. Volpe Lovato, Hora Fontes, Embiruçu, and Kalid (2018) proposed a decision making tool to address confict management in air trafc control which lead to improvement in safety and optimized use of airspace. Jeng and Tzeng (2012) developed clinical decision support system for assessment of behavioural safety of healthcare professionals. But the application of DSS in industrial safety is limited. Further the use of statistical tools for making a DSS efective is an issue, as it depends on the availability and the nature of safety data. Nevertheless, DSS always gives valuable insights for safety improvement. The motivation of the study is to present a DSS for managing crane safety and to de- monstrate how statistical modelling coupled with data mining algo- rithms could be useful in developing DSS when safety data are cate- gorical in nature. Safety data are primarily categorical in nature. For example, some of the causes of crane incidents as provided by OSHWiki (2016) are ‘boom or crane contact with energized power lines’, ‘movement below the crane hook’, ‘overturned cranes’, ‘fall of objects’, ‘structural col- lapse’, ‘rigging failures’, ‘improper maintenance of crane’ and ‘unskilled operators’. In an incident report, these causes can be recorded as eight possible categorical values for the variable ‘cause of incident’. Since, major portion of the incident details are recorded as categorical data, hence, we restrict our study to categorical variables only. Analysis of categorical data in safety studies is not new. Frequency and causal pattern analysis, and risk assessment using data mining techniques and statistical models are some common applications in this feld (McFadden, 2003; Wu & Yeh, 2006; Shyur, 2008; Nesmith, Keating, & Zacharias, 2013; Verma, Khan, Maiti, & Krishna, 2014; https://doi.org/10.1016/j.cie.2018.12.044 Received 28 June 2018; Received in revised form 1 December 2018; Accepted 16 December 2018 Corresponding author. E-mail address: jhareswar.maiti@hotmail.com (J. Maiti). Computers & Industrial Engineering 128 (2019) 277–289 Available online 17 December 2018 0360-8352/ © 2018 Elsevier Ltd. All rights reserved. T