Accident Analysis and Prevention 38 (2006) 542–555 Estimation of incident clearance times using Bayesian Networks approach Kaan Ozbay ∗ , Nebahat Noyan Civil and Environmental Engineering and Center for Advanced Infrastructure and Transportation (CAIT), Rutgers University, 632 Bowser Road, Piscataway, NJ 08854-8014, USA Received 29 January 2005; received in revised form 11 November 2005; accepted 28 November 2005 Abstract Effective incident management requires a full understanding of various characteristics of incidents to accurately estimate incident durations and to help make more efﬁcient decisions to reduce the impact of non-recurring congestion due to these accidents. Our goal is thus to have a comprehensive and clear description of incident clearance patterns and to represent these patterns with formalisms based on Bayesian Networks (BNs). BNs can be used to create dynamic incident duration estimation trees that can be extracted in the presence of a real incident for which data might only be partially available. This capability will enable trafﬁc operators to create case-speciﬁc incident management strategies in the presence of incomplete information. In this paper, we employ a unique database created using incident data collected in Northern Virginia. This database is then used to demonstrate the advantages of employing BNs as a powerful modeling and analysis tool especially due to their ability to consider the stochastic variations of the data and to allow bi-directional induction in decision-making. In addition to the presentation of the basic theory behind BNs in the context of our problem and the validation of our estimation results, the dependency relations among all variables in the estimated BN that can be used for both quantitative and qualitative analysis are also discussed in detail. © 2005 Elsevier Ltd. All rights reserved. Keywords: Bayesian Networks; Incident management; Decision trees; Probabilistic inference 1. Introduction Incident management involves detection, veriﬁcation and clearance of trafﬁc incidents, as well as minimizing the effects of congestion by reliably estimating incident durations. Deci- sions regarding trafﬁc diversion need to be made based on the predicted incident durations. These durations should be as accu- rate as possible not only for better decision-making but also for disseminating reliable congestion information to drivers. Some incident duration estimation methods (Smith and Smith, 2001) employ a set of decision trees developed using standard classi- ﬁcation techniques proposed by Breiman et al. (1984). Ozbay and Kachroo (1999) also suggested the use of decision trees for incident duration estimation. However, the decision trees can sometimes be unstable and insensitive to the stochastic nature of data. The nodes of traditional classiﬁcation trees proposed by Breiman et al. (1984) have ﬁxed average values that do not allow ∗ Corresponding author. Tel.: +1 732 445 2792; fax: +1 732 445 0577. E-mail addresses: kaan@rci.rutgers.edu (K. Ozbay), mableu@eden.rutgers.edu (N. Noyan). the decision maker to model stochastic nature of the parent–child relationships in a realistic way. Moreover, one-way relationships do not allow bi-directional induction for decision-making. BNs constitute one of the most popular formalisms for rea- soning and prediction under uncertainty. This study uses BNs as a knowledge discovery process to accurately predict inci- dent durations. A BN is a graph in which nodes represent stochastic variables and arcs represent dependencies among these variables. Usually, assigning a value to a variable deter- mines the state of the variable. Because the variables used in a BN are stochastic, probability distributions determine the state of the variable. BNs offer an effective way to describe the over- all dependency structure of a large number of variables, thus removing the limitation of examining the pair-wise associations between variables. Furthermore, one can easily investigate undi- rected relationships between the variables, in addition to making predictions and providing explanations, by querying the net- work. The purpose of this research is to develop a model that can automatically learn emerging patterns in data to aid in the predic- tion of incident clearance times. By assigning values from a real data set to the decision variables (predictors), incident patterns 0001-4575/$ – see front matter © 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.aap.2005.11.012