Accident Analysis and Prevention 45 (2012) 478–486
Contents lists available at SciVerse ScienceDirect
Accident Analysis and Prevention
jo ur n al hom ep a ge: www.elsevier.com/locate/aap
Using support vector machine models for crash injury severity analysis
Zhibin Li
1
, Pan Liu
∗
, Wei Wang
2
, Chengcheng Xu
3
School of Transportation, Southeast University, Si Pai Lou #2, Nanjing 210096, China
a r t i c l e i n f o
Article history:
Received 20 July 2011
Received in revised form 22 August 2011
Accepted 28 August 2011
Keywords:
Support vector machine model
Ordered probit model
Crash severity
Freeway diverge area
a b s t r a c t
The study presented in this paper investigated the possibility of using support vector machine (SVM)
models for crash injury severity analysis. Based on crash data collected at 326 freeway diverge areas,
a SVM model was developed for predicting the injury severity associated with individual crashes. An
ordered probit (OP) model was also developed using the same dataset. The research team compared the
performance of the SVM model and the OP model. It was found that the SVM model produced better
prediction performance for crash injury severity than did the OP model. The percent of correct prediction
for the SVM model was found to be 48.8%, which was higher than that produced by the OP model (44.0%).
Even though the SVM model may suffer from the multi-class classification problem, it still provides better
prediction results for small proportion injury severities than the OP model does.
The research also investigated the potential of using the SVM model for evaluating the impacts of
external factors on crash injury severities. The sensitivity analysis results show that the SVM model
produced comparable results regarding the impacts of variables on crash injury severity as compared to
the OP model. For several variables such as the length of the exit ramp and the shoulder width of the
freeway mainline, the results of the SVM model are more reasonable than those of the OP model.
© 2011 Elsevier Ltd. All rights reserved.
1. Introduction
The analysis of crash injury severity is of great interest to many
transportation professionals. One of the main objectives of crash
injury severity analysis is to understand the relationship between
the injury severity of crashes and various contributing factors such
as the driver and passenger characteristics, vehicles types, traffic
conditions, geometric design characteristics, as well as the collision
types of crashes, etc. Such information will help decision mak-
ers better understand the impacts of contributing factors on crash
injury severity and implement treatments to reduce the severity of
crashes.
Injury severity data are generally represented by discrete cat-
egories such as fatal, incapacitating injury, non-incapacitating
injury, possible injury and property damage only, etc. Tradition-
ally, transportation professionals use statistical models to evaluate
the effects of contributing factors on crash injury severity. Among
them, the ordered probit (OP) models and their variations are prob-
ably the most commonly used modeling techniques (Odonnell and
∗
Corresponding author. Tel.: +86 025 83791816.
E-mail addresses: lizhibin@seu.edu.cn (Z. Li), pan liu@hotmail.com (P. Liu),
wangwei@seu.edu.cn (W. Wang), iamxcc@163.com (C. Xu).
1
Tel.: +86 13952097374.
2
Tel.: +86 13905170160.
3
Tel.: +86 13801580045.
Connor, 1996; Duncan et al., 1998; Kockelman and Kweon, 2002;
Abdel-Aty, 2003; Zajac and Ivan, 2003; Abdel-Aty and Abdelwahab,
2004; Lee and Abdel-Aty, 2005; Siddiqui et al., 2006; Yau et al.,
2006; Xie et al., 2009; Wang et al., 2011). Some other statistical
models have also been proposed for crash injury severity anal-
ysis, including the ordered logit model (Odonnell and Connor,
1996), the multinomial logit model (Shankar and Mannering, 1996;
Khorashadi et al., 2005; Savolainen and Mannering, 2007), and the
logistic regression model (Al-Ghamdi, 2002), etc.
Even though traditional statistical models have been widely
used for crash injury severity analysis, they do suffer from some
limitations. For example, traditional statistical modeling requires
assumptions about the distribution of data and, usually, a linear
functional form between dependent and explanatory variables.
These assumptions may not always hold true. When basic assump-
tions of traditional statistical models were violated, erroneous
estimations and incorrect inferences could be produced (Mussone
et al., 1999; Delen et al., 2006).
To overcome the limitations associated with traditional statis-
tical models, previous researchers have proposed non-parametric
methods and artificial intelligence models for crash injury sever-
ity analysis. These models include the classification and regression
tree (CART) model (Sohn and Shin, 2001; Karlaftis and Golias, 2002;
Chang and Wang, 2006), the Bayesian network model (Simoncic,
2004; de Ona et al., 2011), and the artificial neural network models
(Abdelwahab and Abdel-Aty, 2002; Delen et al., 2006; Xie et al.,
2007), etc. As compared to traditional parametric models such
0001-4575/$ – see front matter © 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.aap.2011.08.016