Decision Support Systems 150 (2021) 113557 Available online 20 March 2021 0167-9236/© 2021 Elsevier B.V. All rights reserved. A probabilistic Bayesian inference model to investigate injury severity in automobile crashes Kazim Topuz a , Dursun Delen b, c, * a School of Finance, Operations Management and International Business, University of Tulsa, Tulsa, OK, USA b Department of Management Science and Information Systems, Oklahoma State University, Stillwater, OK, USA c School of Management, Halic University, Istanbul 34445, Turkey A R T I C L E INFO Keywords: Data science Bayesian inference Injury severity Decision making Cross-entropy loss function, Bayesian missing data handling ABSTRACT Big data analytics examines millions, if not billions of records, to unmask hidden patterns, provide actionable insights and interpretable results for various domains. One area that has great potential to leverage the value of big data and analytics is the critical analysis of traffc accidents. Investigation results help in providing an in- depth understanding of the risks and provide measures to potentially prevent these risk factors hence enhancing the well-being of individuals who may experience such accidents. This study explains existing models and proposes a data science methodology in a feld where probabilistic modeling makes much sense for faster, better decision-making. The main objective of this data analytics study is to identify the high-risk factors with their apparent signifcance to infuence the probability of injury severity on automobile crashes using a geographically representative car crash dataset. To obtain reliable, accurate, and intuitive results, a multi-step probabilistic inference model based on Bayesian Belief Networkhighly-acclaimed machine learning method- ologyis proposed. The underlying inference model provides researchers with a causally accurate way to explore the domain (with the subject matter expert inputs) while disengaging issues related to statistical cor- relations and causal effects. In this study, we also used the data to create a web-based probabilistic inference simulator, a Bayesian inference decision support tool, which will be a publicly available/accessible tool, to help decision-makers better understand and to conduct what-if analysis on variable interdependencies. 1. Introduction Companies and organizations are persistently collecting data that is usually characterized by its unprecedented volume, variety, and velocity and is often called big data.Whether it is structured, semi-structured, or unstructured, big data is useless unless it portrays value and practical application. Big data analytics look at millions, if not billions of records, to unmask hidden patterns, provide actionable insights and interpret- able results for various healthcare topics to operations management [5,12]. One area that has great potential to leverage the value of big data and analytics is the critical analysis of traffc accidents. Examining the risk factors correlated with the severity of injuries after motor vehicle crashes have become a thought-provoking and challenging research problem [32]. Investigation results help in providing an in-depth un- derstanding of the risks and provide measures to potentially prevent these risks hence enhancing the well-being of individuals who may experience such accidents. Therefore, road safety is a signifcant challenge in the United States and elsewhere. Innovative and upgraded safety measures are continually being developed and integrated into vehicles and highways to minimize car accidents and to alleviate the severity of injuries [14]. Despite relentless efforts to reduce car accidents, there is still a high number of accidents recorded every year. According to the recent statistics from the National Highway Traffc Safety Administration (NHTSA), six million traffc ac- cidents recorded, 30,000 lives have been lost, and over 2.5 million people have been injured within a year [24]. On average, four people lost their lives, and almost 300 people sustained injuries on the US roadways every hour. According to the reports presented by the NHTSA, the societal and economic harm amounts to about $871 billion in a single year. Out of the total estimated amount, $277 billion got attrib- uted to the fnancial costs, which are nearly about $900 for each United States individual. The harm from decreased quality of life, pain, and loss of life because of the injuries from car crashes were estimated to be $594 billion [2,7]. Several factors were found to affect the severity of injuries * Corresponding author at: Spears School of Business, Oklahoma State University, 700 North Greenwood Avenue, North Hall 302, Tulsa, Oklahoma 74106, USA. E-mail addresses: kazim-topuz@utulsa.edu (K. Topuz), dursun.delen@okstate.edu, dursundelen@halic.edu.tr (D. Delen). Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss https://doi.org/10.1016/j.dss.2021.113557 Received 15 July 2020; Received in revised form 12 March 2021; Accepted 16 March 2021