Accident Analysis and Prevention 45 (2012) 478–486 Contents lists available at SciVerse ScienceDirect Accident Analysis and Prevention jo ur n al hom ep a ge: www.elsevier.com/locate/aap Using support vector machine models for crash injury severity analysis Zhibin Li 1 , Pan Liu ∗ , Wei Wang 2 , Chengcheng Xu 3 School of Transportation, Southeast University, Si Pai Lou #2, Nanjing 210096, China a r t i c l e i n f o Article history: Received 20 July 2011 Received in revised form 22 August 2011 Accepted 28 August 2011 Keywords: Support vector machine model Ordered probit model Crash severity Freeway diverge area a b s t r a c t The study presented in this paper investigated the possibility of using support vector machine (SVM) models for crash injury severity analysis. Based on crash data collected at 326 freeway diverge areas, a SVM model was developed for predicting the injury severity associated with individual crashes. An ordered probit (OP) model was also developed using the same dataset. The research team compared the performance of the SVM model and the OP model. It was found that the SVM model produced better prediction performance for crash injury severity than did the OP model. The percent of correct prediction for the SVM model was found to be 48.8%, which was higher than that produced by the OP model (44.0%). Even though the SVM model may suffer from the multi-class classiﬁcation problem, it still provides better prediction results for small proportion injury severities than the OP model does. The research also investigated the potential of using the SVM model for evaluating the impacts of external factors on crash injury severities. The sensitivity analysis results show that the SVM model produced comparable results regarding the impacts of variables on crash injury severity as compared to the OP model. For several variables such as the length of the exit ramp and the shoulder width of the freeway mainline, the results of the SVM model are more reasonable than those of the OP model. © 2011 Elsevier Ltd. All rights reserved. 1. Introduction The analysis of crash injury severity is of great interest to many transportation professionals. One of the main objectives of crash injury severity analysis is to understand the relationship between the injury severity of crashes and various contributing factors such as the driver and passenger characteristics, vehicles types, trafﬁc conditions, geometric design characteristics, as well as the collision types of crashes, etc. Such information will help decision mak- ers better understand the impacts of contributing factors on crash injury severity and implement treatments to reduce the severity of crashes. Injury severity data are generally represented by discrete cat- egories such as fatal, incapacitating injury, non-incapacitating injury, possible injury and property damage only, etc. Tradition- ally, transportation professionals use statistical models to evaluate the effects of contributing factors on crash injury severity. Among them, the ordered probit (OP) models and their variations are prob- ably the most commonly used modeling techniques (Odonnell and ∗ Corresponding author. Tel.: +86 025 83791816. E-mail addresses: lizhibin@seu.edu.cn (Z. Li), pan liu@hotmail.com (P. Liu), wangwei@seu.edu.cn (W. Wang), iamxcc@163.com (C. Xu). 1 Tel.: +86 13952097374. 2 Tel.: +86 13905170160. 3 Tel.: +86 13801580045. Connor, 1996; Duncan et al., 1998; Kockelman and Kweon, 2002; Abdel-Aty, 2003; Zajac and Ivan, 2003; Abdel-Aty and Abdelwahab, 2004; Lee and Abdel-Aty, 2005; Siddiqui et al., 2006; Yau et al., 2006; Xie et al., 2009; Wang et al., 2011). Some other statistical models have also been proposed for crash injury severity anal- ysis, including the ordered logit model (Odonnell and Connor, 1996), the multinomial logit model (Shankar and Mannering, 1996; Khorashadi et al., 2005; Savolainen and Mannering, 2007), and the logistic regression model (Al-Ghamdi, 2002), etc. Even though traditional statistical models have been widely used for crash injury severity analysis, they do suffer from some limitations. For example, traditional statistical modeling requires assumptions about the distribution of data and, usually, a linear functional form between dependent and explanatory variables. These assumptions may not always hold true. When basic assump- tions of traditional statistical models were violated, erroneous estimations and incorrect inferences could be produced (Mussone et al., 1999; Delen et al., 2006). To overcome the limitations associated with traditional statis- tical models, previous researchers have proposed non-parametric methods and artiﬁcial intelligence models for crash injury sever- ity analysis. These models include the classiﬁcation and regression tree (CART) model (Sohn and Shin, 2001; Karlaftis and Golias, 2002; Chang and Wang, 2006), the Bayesian network model (Simoncic, 2004; de Ona et al., 2011), and the artiﬁcial neural network models (Abdelwahab and Abdel-Aty, 2002; Delen et al., 2006; Xie et al., 2007), etc. As compared to traditional parametric models such 0001-4575/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.aap.2011.08.016