Please cite this article in press as: Guo, F., Fang, Y., Individual driver risk assessment using naturalistic driving data. Accid. Anal. Prev. (2012), http://dx.doi.org/10.1016/j.aap.2012.06.014 ARTICLE IN PRESS G Model AAP-2815; No. of Pages 7 Accident Analysis and Prevention xxx (2012) xxx–xxx Contents lists available at SciVerse ScienceDirect Accident Analysis and Prevention j ourna l h o mepage: www.elsevier.com/locate/aap Individual driver risk assessment using naturalistic driving data Feng Guo a,∗ , Youjia Fang b a Department of Statistics, Virginia Tech Transportation Institute, Virginia Tech, 406A Hutcheson Hall, Blacksburg, VA 24061-0439, USA b Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA a r t i c l e i n f o Article history: Received 30 November 2011 Received in revised form 6 June 2012 Accepted 18 June 2012 Keywords: Individual driver risk Naturalistic Driving Study NEO-5 Personality inventory Critical incident K-mean cluster a b s t r a c t Driving risk varies substantially among drivers. Identifying and predicting high-risk drivers will greatly beneﬁt the development of proactive driver education programs and safety countermeasures. The objec- tive of this study is twofold: (1) to identify factors associated with individual driver risk and (2) predict high-risk drivers using demographic, personality, and driving characteristic data. The 100-Car Naturalis- tic Driving Study was used for methodology development and application. A negative binomial regression model was adopted to identify signiﬁcant risk factors. The results indicated that the driver’s age, personal- ity, and critical incident rate had signiﬁcant impacts on crash and near-crash risk. For the second objective, drivers were classiﬁed into three risk groups based on crash and near-crash rate using a K-mean cluster method. The cluster analysis identiﬁed approximately 6% of drivers as high-risk drivers, with average crash and near-crash (CNC) rate of 3.95 per 1000 miles traveled, 12% of drivers as moderate-risk drivers (average CNC rate = 1.75), and 84% of drivers as low-risk drivers (average CNC rate = 0.39). Two logistic models were developed to predict the high- and moderate-risk drivers. Both models showed high predic- tive powers with area under the curve values of 0.938 and 0.930 for the receiver operating characteristic curves. This study concluded that crash and near-crash risk for individual drivers is associated with crit- ical incident rate, demographic, and personality characteristics. Furthermore, the critical incident rate is an effective predictor for high-risk drivers. © 2012 Elsevier Ltd. All rights reserved. 1. Introduction The substantial variation in individual driving risk has been doc- umented in many studies (Deery and Fildes, 1999; Ulleberg, 2001; Dingus et al., 2006). Identifying factors associated with individ- ual driving risk and predicting high-risk drivers will enable proper driver-behavior intervention and safety countermeasures to reduce the crash likelihood of high-risk groups and improve overall driving safety. Trafﬁc safety research involves drivers, vehicles and driving environment. There are extensive literatures on the safety impact of transportation infrastructure and trafﬁc characteristics, e.g., the impacts of intersection design features, pavement conditions, weather, and trafﬁc ﬂow conditions (Hauer et al., 1988; Poch and Mannering, 1996; Maze et al., 2006; Guo et al., 2010; Lord and Mannering, 2010). Crash occurrence is the primary risk measure for infrastructure-related safety impact evaluation, with Poisson and negative binomial (NB) models being the state-of-practice analysis tools. However, there are limited researches on individual driver risk in trafﬁc and human factor engineering ﬁelds. ∗ Corresponding author. Tel.: +1 540 231 1038; fax: +1 540 231 3863. E-mail addresses: feng.guo@vt.edu (F. Guo), youjia@vt.edu (Y. Fang). Contrary to trafﬁc engineers, the insurance and actuarial science industries have a long history of research on classiﬁcation of drivers according to risk level to facilitate underwriting and pricing. Esti- mation of the occurrence of claims based on the driver’s age and other relevant variables has been a standard practice in actuarial research (Segovia-Gonzalez et al., 2009). For the insurance industry, quantiﬁed individual risk is directly related to the risk classiﬁcation standards (Walters, 1981). However, insurance data are proprietary and, in general, not available for public access. Individual driver risk can be affected by many factors. Besides demographic variables such as age and gender, driver personal- ity – commonly measured by the NEO ﬁve traits inventory or Zuckerman’s Sensation Seeking Scale, – also plays an important role in individual driving risk (Costa and McCrea, 1992). Studies have shown the association between personality characteristics and risky driving behavior (Jonah, 1997; Jonah et al., 2001; Ulleberg and Rundmo, 2003; Dahlen and White, 2006; Machin and Sankey, 2008). Driver behavior plays a central role in driver risk but it is difﬁcult to measure in real-world driving situations. Recent developments in vehicle instrumentation techniques, such as in Naturalistic Driv- ing Study (NDS) (University of Michigan Transportation Research Institute, 2005; Dingus et al., 2006; Guo and Hankey, 2009) and the DriveCam system (Hickman et al., 2010) have made it both tech- nologically possible and economically feasible to monitor driving 0001-4575/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aap.2012.06.014