Decision Support Systems 150 (2021) 113557
Available online 20 March 2021
0167-9236/© 2021 Elsevier B.V. All rights reserved.
A probabilistic Bayesian inference model to investigate injury severity in
automobile crashes
Kazim Topuz
a
, Dursun Delen
b, c, *
a
School of Finance, Operations Management and International Business, University of Tulsa, Tulsa, OK, USA
b
Department of Management Science and Information Systems, Oklahoma State University, Stillwater, OK, USA
c
School of Management, Halic University, Istanbul 34445, Turkey
A R T I C L E INFO
Keywords:
Data science
Bayesian inference
Injury severity
Decision making
Cross-entropy loss function, Bayesian missing
data handling
ABSTRACT
Big data analytics examines millions, if not billions of records, to unmask hidden patterns, provide actionable
insights and interpretable results for various domains. One area that has great potential to leverage the value of
big data and analytics is the critical analysis of traffc accidents. Investigation results help in providing an in-
depth understanding of the risks and provide measures to potentially prevent these risk factors hence
enhancing the well-being of individuals who may experience such accidents. This study explains existing models
and proposes a data science methodology in a feld where probabilistic modeling makes much sense for faster,
better decision-making. The main objective of this data analytics study is to identify the high-risk factors with
their apparent signifcance to infuence the probability of injury severity on automobile crashes using a
geographically representative car crash dataset. To obtain reliable, accurate, and intuitive results, a multi-step
probabilistic inference model based on Bayesian Belief Network— highly-acclaimed machine learning method-
ology—is proposed. The underlying inference model provides researchers with a causally accurate way to
explore the domain (with the subject matter expert inputs) while disengaging issues related to statistical cor-
relations and causal effects. In this study, we also used the data to create a web-based probabilistic inference
simulator, a Bayesian inference decision support tool, which will be a publicly available/accessible tool, to help
decision-makers better understand and to conduct what-if analysis on variable interdependencies.
1. Introduction
Companies and organizations are persistently collecting data that is
usually characterized by its unprecedented volume, variety, and velocity
and is often called “big data.” Whether it is structured, semi-structured,
or unstructured, big data is useless unless it portrays value and practical
application. Big data analytics look at millions, if not billions of records,
to unmask hidden patterns, provide actionable insights and interpret-
able results for various healthcare topics to operations management
[5,12]. One area that has great potential to leverage the value of big data
and analytics is the critical analysis of traffc accidents. Examining the
risk factors correlated with the severity of injuries after motor vehicle
crashes have become a thought-provoking and challenging research
problem [32]. Investigation results help in providing an in-depth un-
derstanding of the risks and provide measures to potentially prevent
these risks hence enhancing the well-being of individuals who may
experience such accidents. Therefore, road safety is a signifcant
challenge in the United States and elsewhere.
Innovative and upgraded safety measures are continually being
developed and integrated into vehicles and highways to minimize car
accidents and to alleviate the severity of injuries [14]. Despite relentless
efforts to reduce car accidents, there is still a high number of accidents
recorded every year. According to the recent statistics from the National
Highway Traffc Safety Administration (NHTSA), six million traffc ac-
cidents recorded, 30,000 lives have been lost, and over 2.5 million
people have been injured within a year [24]. On average, four people
lost their lives, and almost 300 people sustained injuries on the US
roadways every hour. According to the reports presented by the NHTSA,
the societal and economic harm amounts to about $871 billion in a
single year. Out of the total estimated amount, $277 billion got attrib-
uted to the fnancial costs, which are nearly about $900 for each United
States individual. The harm from decreased quality of life, pain, and loss
of life because of the injuries from car crashes were estimated to be $594
billion [2,7]. Several factors were found to affect the severity of injuries
* Corresponding author at: Spears School of Business, Oklahoma State University, 700 North Greenwood Avenue, North Hall 302, Tulsa, Oklahoma 74106, USA.
E-mail addresses: kazim-topuz@utulsa.edu (K. Topuz), dursun.delen@okstate.edu, dursundelen@halic.edu.tr (D. Delen).
Contents lists available at ScienceDirect
Decision Support Systems
journal homepage: www.elsevier.com/locate/dss
https://doi.org/10.1016/j.dss.2021.113557
Received 15 July 2020; Received in revised form 12 March 2021; Accepted 16 March 2021