International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-9 Issue-3, January 2020 1432 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B7663129219/2020©BEIESP DOI: 10.35940/ijitee.B7663.019320 Abstract: In order to uncover hidden patterns and correlations, data analysis examines large amounts of data. Analysis of crime isa systematic approach to the identification and analysis of crime patterns and itstrends. This plays a role in the planning of problems with crime and in formulating strategies for crime prevention. Instead of focusing on causes of crime such as criminal offender background, this work focuses primarily crime factors happened on every day. This work can predict the category of crime that has a higher likelihood of occurrence in those areas and can visualize in the form of histogram and heat map by category of crime, crime by day of week and month. The study depends on a lot of variables like class, latitude, longitude, etc. For forecast, the multinomial logistic regression method is used. For weekdays, the district and the hour of the accident are used as predictors.This algorithm is used because its target variable has more than two values and no ordering in the response variable.This provides greater efficiency for handling datasets with multi class labels. This forecast can be helpful in predicting the occurrence of crime in vulnerable areas, which in turn minimizes the crime rate by providing the patrol in those areas. Keywords: Data Analytics, Prediction, Regression ,Machine Learning. I. INTRODUCTION Big data analytics involves collecting data from different resources to manipulate and then finally deliver as useful products to the organization. It is useful to integrate raw data acquired from different sour ces into a data item, forms the core of Big Data Analytics.Th ere are two methodologies in Data Analytics : Exploratory Data Analytics (EDA) and Confirmatory Data Analytics (CDA). EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. We can just see what the data tell us beyond the formal modeling task. In EDA data are explored which can provide information about the numbers of factors required to represent the data. CDA is a multivariate statistical procedure through which we can test how well the measured variables represent the construct. CDA is a tool which can be useful to confirm the measurement theory. There are five Characteristics which is the building blocks of an efficient data analytics solution: Accuracy, Completeness, Consistency, Uniqueness and Timeliness. There is an another characteristic in data analytics called as Data Visualization which describes the presentation of abstract information in graphical form. It allows users to spot patterns, trends, and Revised Manuscript Received on January 05, 2020. R. Rajadevi, Department of Information Technology, Kongu Engineering College, Perundurai, India. E-mail: rajdevi@kongu.ac.in E. M. Roopa Devi, Department of Information Technology, Kongu Engineering College, Perundurai, India. E-mail: roopadevi@kongu.ac.in S. Vinoth Kumar, Department of Information Technology, Kongu Engineering College, Perundurai, India. E-mail: vinoths@kongu.ac.in correlations that otherwise might go unnoticed in traditional reports, tables, or spreadsheets. There are two basic types of Data Visualization: Exploration and Explanation. By using these categories we have many ways to make data can be visual. The most common types of data visualization are Heat map Cartogram, Choropleth,Dot Distribution Map, Connected Scatter Plot, Polar Area Diagram, Time Series, Pie Chart, Histogram, Scatter Plot, Dendrogram, Ring Chart, Tree Diagram, Alluvial Diagram, Node-Link Diagram, Matrix. A heat map is a two-dimensional representation of data in which values are represented0020by colors. It provides an immediate visual summary of information. It provides easy understanding of complex data sets. II. RELATED WORK . The existing system deals with large set of data and it consist of centralized database. Running Algorithm like Multinomial Logistic Regression has higher time complexity. It consists of 39 categories of crime but classification in somehow difficult. System will consists of poor accuracy and replicated values which will lead to large time consuming. As the system is centralized and it does not distribute the task or data, the retrieval and processing of the system consumes large amount of time. The analysis task is more complex and identifying error rate is difficult. III. PROBLEM STATEMENT The existing method is more complex for analysis and also it has the complex structure which provides complex view to the users. The designers of the System felt so difficult for giving such proper working model. It has large space complexity and time complexity. It seems to be difficult to predict the crime and to process the data from the records of crime. It neither displays the result in the pictorial form nor in the comparative manner. The algorithms such as Random Forest and Naive Bayes has higher complexity in both time as well as space The huge data makes higher complexity in prediction IV. PROPOSED SYSTEM The proposed system is to identify and visualize the occurre nce of the crime with higher probability of critical areas. Th ese results are used to predict crime rates in sensitive areas .. The analysis depends on several factors such as latitude and longitude etc.., Data are collected, classified and visualized using graphs. Multinomial logistic regression algorithm is used for prediction. Day of week, District and hour of the incident are used as predictors. Prediction of Crime Occurrence using Multinomial Logistic Regression R. Rajadevi, E. M. Roopa Devi, S. Vinoth Kumar