Mining Forensic Medicine Data for Crime Prediction A. Abdo 1 , Hanan Fahmy 2 , Amir Abobaker Shaker 3 Faculty of Computers and Information, Helwan University, Cairo, Egypt Abstract- Crime prevention is the important aspect which is given higher priority by governments to achieve security. This work aims to present a proposed framework for crime prediction using data mining techniques, it also aims at helping the Egyptian government to make strategically decisions to reduce crimes. The proposed framework consists of six phases, the first phase is data preprocessing, in this phase two proposed routines were presented to extract age and gender from Egyptian national number, then the second phase included building the proposed data warehouse (FMA-DW). The third phase which is ETL strategy (extract, transform and load data in to data warehouse) has been presented, then data were divided in to (train dataset and test dataset) to apply the next two phases. The fourth phase is applying data mining techniques, the fifth phase which is the evaluation phase where test dataset is used to evaluate the performance and accuracy of the proposed framework. Finally the inference phase (simulator). A real-world data were collected from Egyptian forensic medicine authority to predict crime. The experimental results showed that the framework obtained acceptable results about 98%. Keywords—crime prediction, data mining, Naïve bayes, ETL, data warehouse, hybrid technique. I. INTRODUCTION Crime are public social problem affecting the economy of communities and life [1], also it define the places should be avoided by people [2]. There is a strong body of evidence to support the opinion that crime is predictable because criminals tend to operate in their areas. Therefore, criminals tend to repeat the same type of crimes that they have committed successfully in the past in the same time and area [3]. In the past, solving crimes have been the prerogative of the law enforcement officers. Nowadays, the increasing use of the computers to track crimes, data analysis with computers has helped the law enforcement specialists to speed up the process of solving crimes [4]. Enhancing information awareness is a critical objective for the Egyptian government, thus Egypt sponsors the National Project for Law Enforcement to the speed of litigation procedures. This project includes several sectors (police, justice and forensic medicine) which will lead to the existence of a huge database. Due to the increasing data, there is a need of technologies to analyze these data, so this study will use data mining, as it is an effective technique which allows searching for useful information and valuable in huge volumes of data [5], also the complexity of relationships between crime data have made study of crime an appropriate field for applying data mining techniques, in addition the knowledge that is gained from data mining techniques is useful to support Law Enforcement [6]. A lot of technologies were integrated in this study to apply data mining techniques for crime prediction such as data warehouse which helps to support decision making [7]. One of the most important tasks of the data warehouse is gathering heterogeneous data from several sources and integrates them into a single dataset to monitor historical trends and patterns [8]. The proposed data warehouse (FMA-DW) has been built using (MS SQL Server 2008) to crime prediction from forensic medicine database, thus FMA-DW included the measured data about crime prediction (crime type, areas, date, crime’s persons, gender, age). The important step to create a data warehouse is to extract, transform, and load data to database (ETL), ETL system refer to extract data from the several sources, then these sources can be used together to generate a suitable data, and finally ETL delivers this data in a suitable format to DW, so that the end users can make decisions [9]. Generally in this work a framework has been built to analyze Egyptian forensic medicine data for crime prediction by applying data mining techniques (DMT). The remaining sections in this paper are organized as follows: First, section (2) showed related work. In section (3) research methodology and the proposed framework are presented in details. In Section (4) evaluation phase, the experimental results and the inference phase are discussed. Finally, in section (5) conclusion and suggested future work are showed. II. RELATED WORK Many works and large datasets have been analyzed to predict crimes using different data mining techniques. In [10], Labib identified that the main purpose of this work was to introduce a data mining model by applying naïve bayes, decision trees, and association rules to predict the most important attributes affecting crime. Real-world data was collected from the Egyptian ministry of interior. Labib referred that data from 1996 to 2012 from (Alexandria, Egypt) were collected, and data consist of criminals’ personal information including (age, profession, mental and educational level, social class, crimes areas, and crimes types). Finally, the accuracy of the used algorithms is reaching up to 92%. In [11], Soliman presented, the types of crimes in Kuwait governorates, in addition to the high time of crimes in Kuwait. This study used K-means and dynamic clustering algorithms to identify crime hot spots. In this study crime types are divided into different types such as (drugs, forging, adultery, assault, suicide, and etc...), and data were collected from police departments of Kuwait (real data) about 1000 crime cases. The accuracy was 98.7% after applying random subspace classifier for Kuwait datasets. In [12], Zubi and Mahmud presented a proposed model for International Journal of Computer Science and Information Security (IJCSIS), Vol. 17, No. 6, June 2019 56 https://sites.google.com/site/ijcsis/ ISSN 1947-5500