International Journal of Computer Applications (0975 8887) Volume 35No.12, December 2011 30 Classification of Vehicle Collision Patterns in Road Accidents using Data Mining Algorithms S.Shanthi Senior Lecturer (Ph.D. Research Scholar) Department of Computer Science & Engineering Rajalakshmi Institute of Technology (Affiliated to Anna University, Chennai) Kuthambakkam, Chennai, India Dr.R.Geetha Ramani Professor and Head, Department of Computer Science & Engineering Rajalakshmi Engineering College (Affiliated to Anna University, Chennai) Thandalam, Chennai, India ABSTRACT This paper emphasizes the importance of Data Mining classification algorithms in predicting the vehicle collision patterns occurred in training accident data set. This paper is aimed at deriving classification rules which can be used for the prediction of manner of collision. The classification algorithms viz. C4.5, C-RT, CS-MC4, Decision List, ID3, Naïve Bayes and RndTree have been applied in predicting vehicle collision patterns. The road accident training data set obtained from the Fatality Analysis Reporting System (FARS) which is available in the University of Alabama’s Critical Analysis Reporting Environment (CARE) system. The experimental results indicate that RndTree classification algorithm achieved better accuracy than other algorithms in classifying the manner of collision which increases fatality rate in road accidents. Also the feature selection algorithms including CFS, FCBF, Feature Ranking, MIFS and MODTree have been explored to improve the classifier accuracy. The result shows that the Feature Ranking method significantly improved the accuracy of the classifiers. General Terms Data Mining, Classification Algorithms, Feature Selection, Accident Data Analysis Keywords Classification Algorithms, Feature Selection Algorithms, Manner of Collision, Fatal Severity, Collision Patterns, Prediction 1. INTRODUCTION The ever increasing tremendous amount of data, collected and stored in large and numerous data bases, has far exceeded human ability for comprehension without the use of powerful tools [3]. Consequently, important decisions are often made based not on the information rich data stored in databases but rather on a decision maker’s intuitions due to the lack of tools to extract the valuable knowledge embedded in the vast amounts of data [3]. This is why data mining has received great attention in recent years. Data mining involves an integration of techniques from multiple disciplines such as database technology, statistics, machine learning, high-performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial data analysis [3][19]. General data mining principles, including Associations, Sequential Patterns, Classifications, Predictions, and Clustering, can be applied to many areas. Classification algorithms give interesting results from a large set of data attributes. The costs of fatalities and injuries due to traffic accidents have a great impact on society. The World Health Organization [14] predicts that road collisions will jump from the ninth leading cause of death in 2004 to the fifth in 2030. Many research works are concentrating on analyzing various crash related factors which increase the death ratio. In relation to this, fatal severities resulted from road traffic accident are one of the areas of concern. Out of all road related factors the manner of collision influences the fatal rate. As the size of these accident databases increases rapidly both spatially and temporally, it is quite a challenge to analyze and extract useful information from them without using advanced data analysis tools. The contribution of classification algorithms in analyzing the road accident factors are discussed in the following sections. The next subsection gives an overview of the paper. 1.1 Organization of the paper The paper is organized as follows. Section 2 provides the summary of related work in this area. In section 3 we investigate the data set and discuss the system model. Section 4 discusses the preparation of the data for analysis and brief about the relevance analysis. Section 5 illustrates the classification algorithms used for the empirical study. The experimental results and observations are discussed in Section 6, and the conclusions and future research directions are presented in Section 7. Section 8 lists the references used in this study and Section 9 gives the authors profile. In next section we discuss the related work carried out in this area. 2. LITERATURE SURVEY Handan et.al [4] compared logistic regression model with classification tree method in determining social-demographic risk factors which have affected depression status of women in separate postpartum periods. They proposed that Classification tree method gives more information with detail on diagnosis by evaluating a lot of risk factors together than logistic regression model. Chang et.al [2] applied non-parametric classification tree techniques to analyze Taiwan accident data from the year 2001. They developed a CART model to find the relationship between injury severity and driver/vehicle characteristics, highway/environment variables, and accident variables.