144 Most accident prediction models belong to the count data regression models, in particular the negative binomial model, which assumes all data or cases are statistically independent. This assumption, how- ever, may be violated when repeated observations over multiple periods (e.g., yearly accident counts) at the same locations (e.g., intersections and road segments) are used as independent cases in model calibration. A common solution for avoiding this problem is to aggregate data from multiple periods for each location into a sin- gle observation (e.g., combining monthly data into yearly data or combining multiyear observations into a single-year average). This aggregation treatment addresses the issue of data correlation but will likely result in loss of information and reduction in sample size (3). This paper introduces a multilevel regression approach to captur- ing the clustered nature of some accident data. The investigation focuses on two speciﬁc questions: (a) What is the impact of using disaggregate modeling approach? and (b) Do multilevel models give substantively different results than a single-level model? A data set compiled for winter road safety is used to examine these questions (4, 5). This accident database includes hourly accident counts for individual winter snowstorms on four highway sections in Ontario. This unique data structure allows development and comparison of models of two levels of aggregation: aggregated event-based and disaggregated hourly based models. The hourly observations are used to calibrate and compare single-level and multilevel models. LITERATURE REVIEW Road accident modeling has been an area of intensive research in the past few decades. A large number of statistical models have been developed and tested for their suitability to address a variety of com- plex issues related to accident data. The general consensus is that the negative binomial (NB) distribution is adequate in most cases for modeling road accident counts because of its ability to capture the common nature of overdispersion in accident data (4–20). The NB model structure has been further extended by many researchers to improve its explanatory power and modeling ﬂexibil- ity. For example, a notable extension is the generalized negative bino- mial (GNB) model, which incorporates a varying dispersion parameter that is a function of a set of covariates. This makes the model capable of controlling for more heterogeneity than does the NB model. It has been shown that use of a varying dispersion parameter could improve model ﬁt (4, 21–27 ). Another extension is to assume that the error term in the NB model follows a normal distribution instead of a gamma distribution. Mod- els of the resulting structure are known as Poisson lognormal (PLN) models. Such models are good for accident rates with heavier tails Accident Prediction Models for Winter Road Safety Does Temporal Aggregation of Data Matter? Taimur Usman, Liping Fu, and Luis F. Miranda-Moreno Most accident prediction models are developed with single-level count data models, such as the traditional negative binomial models with ﬁxed or varying dispersion parameters, assuming independence of data. For many accident data sets in road safety analysis, especially those that are highly disaggregated (hourly data), a hierarchical structure in the data often manifests in some form of correlation. Crash prediction models developed with aggregate data could produce biased results because of the assumption of data independence and inﬂation of the adequacy of the model’s explanation because of the use of aggregate data. The potential effects of data aggregation and correlation on accident prediction mod- els are investigated. The analysis uses an accident database that includes hour-level and storm-level accident counts for individual winter snow- storms at four highway sections in Ontario, Canada. Models of two lev- els of aggregation, aggregated event-based models and disaggregated hourly based models, were developed. The effect of data aggregation had a signiﬁcant effect on model results, whereas the difference between conventional regression and multilevel regression was inconsequential. Road safety is a source of signiﬁcant concern for transportation offi- cials and researchers. According to a World Health Organization report, about 1.2 million people are killed on roads worldwide each year, and as many as 50 million are injured. Continuation of this trend will make road accidents the third-largest cause of injuries worldwide by 2020 (1). Road accidents also result in high social costs. A report by Transport Canada estimates that the annual societal cost due to vehicle collisions exceeds $18 billion in the province of Ontario, Canada, alone (2). Signiﬁcant resources have been allocated to various safety improvement programs involving engineering, education, and reinforcement solutions. Development of cost-effective safety programs entails two impor- tant processes: identiﬁcation of high-risk locations in the network of interest and development of cost-effective countermeasures. Both processes require accident models that can be used to predict and explain accident occurrences through various explanatory factors related to road geometry, vehicle and driver characteristics, weather, and road conditions. T. Usman and L. Fu, Department of Civil and Environmental Engineering, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada. L. F. Miranda-Moreno, Department of Civil Engineering and Applied Mechanics, McGill University, 817 Sherbrooke Street West, Montreal, Quebec H3A 2K6, Canada. Corresponding author: T. Usman, tusman@engmail.uwaterloo.ca. Transportation Research Record: Journal of the Transportation Research Board, No. 2237, Transportation Research Board of the National Academies, Washington, D.C., 2011, pp. 144–151. DOI: 10.3141/2237-16