International Journal of Computer Applications (0975 – 8887) National Conference on Advances in Computing Communication and Application (ACCA-2015) 1 Prediction and Analysis of Injury Severity in Traffic System using Data Mining Techniques Dheeraj Khera Research Scholar, Punjabi University, Patiala Williamjeet Singh Assistant Professor, Dept of Computer Engineering, Punjabi University, Patiala ABSTRACT Road traffic is an essential part to life, but the repeated road accidents bring severe bodily harm and loss of property. Road Traffic Accidents (RTAs) are considered as major public health concern, resulting in 1.2 million deaths and 50 million injuries worldwide each year as per estimation. The want of study is to scrutinize the performance of different taxonomy methods using WEKA and TANAGRA tool on Traffic Injury Severity Dataset. This paper presents results comparison of three supervised data mining algorithms using various performance criteria. The performance is evaluated by the algorithms Naive bayes, ID3 and Random tree. Comparison of Performance of data mining algorithm based on Error rate, Computing time, precision value and accuracy. The comparison of the model using WEKA experimenter showed that Naive Bayes outperforms Random tree and ID3 algorithms with an accuracy of 50.7%, 45.07 and 25.35% respectively and comparison of the model using TANAGRA experimenter showed that Random tree outperforms Naive Bayes and ID3 algorithms with an accuracy of 92.95%, 67.6% and 57.74% respectively. In the end, we have to conclude that TANAGRA tool is the best data mining tools as compare to the WEKA. Keywords Road Traffic Accidents, Data Mining, Naive Bayes, ID3, Random tree, Weka, Tanagra, Accuracy Measure 1. INTRODUCTION Road traffic is an crucial part to life, but the numerous road accidents carry serious bodily harm and loss of property. Each side of road traffic accidents contains a large amount of information and data is the most common form of the most important information records. Data mining has been defined as the nontrivial extraction of previously unknown, implicit and potentially useful information from data .Via mining the data of road traffic accident, we can analysis accident distinctiveness in multi-angles, multi-level and more comprehensive, and discover potential. It is the science of extracting useful information from large databases. It is one of the responsibilities in the process of knowledge discovery from the database. [5]. There are two primary goals of data mining tend to be prediction and description. Prediction involves some variables or fields in the data set to calculate unknown or future values of other variables of interest. On the other hand Description focuses on finding patterns describing the data that can be interpreted by humans. The endeavor of this study is to investigate the performance of different classification methods using WEKA and TANAGRA focuses on Traffic Injury Severity Dataset. Along with some of the free data mining tools accessible these days, paper deals with the use of the categorization technique can be used. Tools on which classification technique has been implemented are Tanagra and WEKA (Waikato Environment for Knowledge Learning). Choices of classifier used for this purpose are Naïve Bayes, ID3 and Random tree. The paper is classified as follows: Section 2 describes the Literature Survey; Section 3 describes Data Mining Tasks. Section 4 describes Methods & Material; it includes the training data set explanation, supervised learning algorithms, Accuracy measures, Weka and Tanagra. Section 5 gives detailed outcome of the experiment and the proportional results of the tools used. Section 6 gives result analysis and finally Section 7 gives the conclusion and future work. 2. LITERATURE SURVEY The costs of fatalities and injuries due to traffic accidents have a great impact on society. In recent years, researchers have paid a great attention at determining the factors that significantly affect driver injury severity in traffic accidents. The author in [1] identifies most important factors which affect injury severity by using classification & regression tree. The crash data from the records of the Information and Technology Department of the Iranian Traffic Police from 2006 to 2008 was used to study hundreds of drivers who were involved in traffic crashes on the main two-lane two-way rural roads of Iran. The results indicated that seat belt is the most important factor associated with injury severity of traffic crashes and not using it significantly increases the probability of being injured or killed. The author in [3] presents a random forest & rough set theory to identify the factors significantly influencing single vehicle crash severity. Fifty-nine records of single-vehicle crashes were extracted from the road traffic accident data between January 2004 and May 2008 in Beijing. The results shows that cause factors of Single vehicle crashes are lighting conditions, vehicle type, driving experience, whether wearing the seat belt or not that affect the severity of a SVC are significant factors. The author in [4] presents a decision tree technique which predicts causes of accidents and accident prone locations on highways. Using WEKA software to analyze accident data collected on Lagos-Ibadan road, it was found that decision tree can accurately predict the causes of accident and accident prone locations along the road. The author in [8] predicts Traffic accident duration of incident and driver information system. In this, actual traffic incident data was used to study the prediction problem of traffic incident duration by the method of neural network. 660 sets of actual traffic incident data from a freeway management center were used to train a neural network model, and 170 sets of incident data in the same data collection, which are different from training data, were used to test the prediction effect of the model. The result shows that the incident duration is statically predicted. In practice, as time goes and incident information gradually increases, the prediction result of incident duration should be dynamically updated for improving prediction accuracy. Table1 shows a sample of different data mining techniques used in traffic injury severity.