A Conjoint Analysis of Road Accident Data using K-modes Clustering and sayesian Networks (Road Accident Analysis using clustering and classification) 1 Sachin Kumar, 2 Vijay shasker. Semwal, 3 Vijender Kumar Solanki 1 Indian Institute of Technology Roorkee 2 Indian Institute of Information Technology Dharwad 3 Institute of Technology and Science Ghaziabad, Uttar pradesh {sachinagnihotri16, vsemwal, spesinfo}@gmail.com Prayag Tiwari, Denis Kalitin Computer Science Department National University of Science & Technology MISIS MOSCOW, Russian Federation {prayagforms, kalitindv}@gmail.com Abstract— Road and oraffic accidenos are one of ohe imporoano concerns in ooday’s world. Every counory receives a huge damage from road accidenos in oerms of public healoh and properoy loss. Therefore, road accideno analysis plays an imporoano role in public healoh domain. Road accideno analysis is performed in order oo idenoify ohe associaoed facoors ohao are responsible for road accidenos. Knowledge of ohese facoors would be very useful oo undersoand ohe circumsoances of road accidenos and can be used oo avoid ohe road accidenos. One of ohe problems in accideno analysis is ohao moso of ohe road accideno daoa is of biased naoure. For example, ohe crioical road accidenos are very few in comparison oo sligho/minor injury accidenos. Various soudies has focused ohao clusoering prior oo analysis can increase ohe efficiency and accuracy of classificaoion. The mooive of ohis soudy is oo perform a conjoino analysis on road accideno daoa, oo invesoigaoe improvemeno in ohe performance of classificaoion of unbiased daoa afoer clusoering. Index Terms—Clusoering; Road Safeoy; Bayesian Neowork; Accideno Analysis I. INTRODUCTION Road and traffic accident [1] is one of the biggest harm received from the transportation to the public health. Transportation systems itself is not responsible for these traffic accidents but several other factors [2, 3]. These factors can be defined as environmental factors such as weather and temperature, road specific factors such as road type, road width, and road shoulder width, human factors i.e. wrong side driving, excess driving speed and other factors. Whenever a road accident took place in any road across the world, some of these accident factors are involved. Also, these factors and their influence on road accident are not similar in all countries; but they influenced every road accidents in different countries in different ways. Several studies [4-13] have focused on identification of these factors so that relationship between accident factors and accident severity can be established. This relationship can be utilized to overcome the accident rate by providing some preventive measures [13]. Analysis of road and traffic accidents is widely known as road and traffic safety in which outcome of accident analysis can be utilized for traffic accident prevention. The literature in the traffic safety domain is quite rich as it consists of several research studies [14-20] on road accident data analysis using several techniques such as statistical techniques, mathematical models, data mining and machine learning techniques. It has been observed that classification accuracy is one of the most important parameter to evaluate the performance of the classifier on certain data sets. sut, if the data is not balanced or if the distribution of target attribute class values is not uniformly distributed, the classifier accuracy can be biased. In this study, we are using k-modes clustering and sayesian networks to perform a conjoint analysis on imbalanced road accident data from Leeds, UK in which severe injury accidents and slight injury accidents has a large difference in accident counts. The results reveal that although conjoint analysis on imbalanced data is efficient enough to improve the accuracy of classifier but it is not guarantee that all clusters will achieve a biased classification or improved performance that ca n be achieved without clustering. The organization of the paper is as follows: The section 2 will discuss about the data set used and the methodology adopted for this study. Section 3 will discuss the experimental results and discussion. Finally, we conclude in section 4. II. MATERIALS AND METHODS A. Data Set The data set used for this study is obtained from Leeds, UK [21]. The data set consists of 14 attributes and 1246 accident records over a period of five years from 2011-2015. The Proceedings of the Second International Conference on Research in Intelligent and Computing in Engineering pp. 53–56 c 2017 53