Submit Manuscript | http://medcraveonline.com Abbreviations: NHTSA, national highway traffc safety administration; IIHS, insurance institute for highway safety; RLC, red light camera, RTM, regression to the mean; SPF, safety performance function; AADT, annual average daily traffc; GOF, goodness of ft testing; AIC, akaike information criterion; DF, degrees of freedom Introduction During 2012, approximately 48% of U.S. crashes occurred at an intersection or were intersection-related, of which over half (53%) were signalized. 1 This indicates an excessive proportion of crashes transpire at signalized intersections considering they constitute only 10% all intersections within the U.S. 2 In addition, crashes at signalized intersections result in considerable numbers of injuries and fatalities. According to the National Highway Traffc Safety Administration (NHTSA), 4,460 fatal crashes and 840,000 injury crashes occurred at a signalized intersection during 2012. 1 Despite national prevention efforts targeting this public health problem, the proportion of fatal crashes occurring at intersections with traffc signals increased 35% between 2000 and 2012. 1,3 Numerous signalized intersection crashes can be attributed to red light running which accounts for 22% of urban collisions and over one-fourth of all injury collisions. 4 According to the U.S. Department of Transportation, approximately 56% of Americans acknowledge running a red light. 5 The Insurance Institute for Highway Safety (IIHS) estimated 683 persons were killed as the result of a red light running crash and another 133,000 persons were injured during 2012. 6 The IIHS also states that half of those killed in red-light running crashes are not signal violators, but the drivers and pedestrians who were struck. 7 The costs associated with red light running crashes are also signifcant. An examination of the safety impact of red light running crashes at intersections in the state of Texas found these crash types have a societal cost of $2 billion annually statewide. 8 Several interventions have been implemented to decrease the risk of red light running crashes, including police enforcement, educational campaigns, and engineering modifcations such as signal timing changes. Red light cameras (RLCs), however, are increasingly being used to discourage red light runners and decrease related crashes. Determining whether RLCs are effective is diffcult for several reasons. 9 One issue is the phenomenon known as regression to the mean (RTM). Since cameras are typically installed at sites with the highest number of violations and/or crashes instead of random assignment, subsequent reductions in the event analyzed could simply be due to RTM, that is, data falling in line with the average results found in the area, even with or without any intervention implementation. If not accounted for, results may be biased in estimating the beneft of RLCs. 10 Models that employ an Empirical Bayes analysis allow researchers account for RTM bias by estimating the number of collisions based on crash counts prior to RLC installation at treatment and comparison sites. The Empirical Bayes method requires an accident prediction model (i.e. safety performance function (SPF)) which is a multiple regression formula that fts collision data for comparison intersections to an independent set of variables that may be expected to affect safety such as speed limit or number of straight-through lanes. SPF’s are used to assist agencies in network screening processes, that is, identifying sites that may beneft from a safety treatment. In addition, SPFs can be instrumental for countermeasure comparisons, and project evaluations. 11 To properly develop an SPF using motor vehicle crash data, the best ft model must be determined. Although linear regression models can be thought of as a good starting point, most researchers decline to use this statistical method. Previous crash studies have elucidated the problems with linear regression models including a lack of a distribution to suffciently explain random, discrete, nonnegative, and sporadic events such as motor vehicle accidents. 12 Due to these problems, subsequent crash studies have adopted other models to develop SPF’s including 1) Poisson regression, which is used to analyze data that are Poisson distributed and 2) negative binomial regression which accounts for over dispersion. Although these two models possess desirable characteristics to explain motor vehicle Biom Biostat Int J. 2016;4(3):94‒99. 94 ©2016 Anthoni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and build upon your work non-commercially. Identifcation of an accident prediction model for red light camera analysis Volume 4 Issue 3 - 2016 Anthoni L, Nasar U Ahmed Department of Epidemiology, Florida International University, Florida Correspondence: Nasar U Ahmed, Department of Epidemiology, Robert Stempel College of Public Health, Florida International University, AHC5-468 Miami, Florida 33199, Florida, Email Received: June 16, 2016 | Published: July 27, 2016 Abstract The purpose of this article was to develop an accident prediction model for motor vehicle crashes occurring within Miami-Dade County, Florida during 2008-2011. Motor vehicle crash data were extracted from the Florida Department of Motor Vehicle and Highway Safety dataset for 40 intersections within Miami-Dade County, Florida for development of an accident prediction model. Each intersection was matched at least one of 20 red light camera (RLC) sites using selected geometric variables. In addition, each intersection examined was at least 2 miles away from any RLC site. The dependent variable examined was the number of injury crashes occurring at each intersection between 2008 and 2011. Poisson, negative binomial, and gamma model distributions were compared using the Pearson’s chi square ( ) 2 χ , scaled deviance (G 2 ), and Akaike Information Criterion (AIC) goodness of ft tests. Our analysis indicated that the negative binomial distribution was the best ft among the three models. Inspection of the observed data also suggested that the outcome variable’s distribution was over dispersed. This study provided guidance on the use of goodness of ft testing (GOF) statistics for Poisson, negative binomial, and gamma models which will allow other researchers to evaluate different models. Keywords: accident prediction model, empirical bayes, red light cameras, motor vehicle crashes, goodness of fit Biometrics & Biostatistics International Journal Research Article Open Access