International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-3, January 2020
1432
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: B7663129219/2020©BEIESP
DOI: 10.35940/ijitee.B7663.019320
Abstract: In order to uncover hidden patterns and correlations,
data analysis examines large amounts of data. Analysis of crime
isa systematic approach to the identification and analysis of crime
patterns and itstrends. This plays a role in the planning of
problems with crime and in formulating strategies for crime
prevention. Instead of focusing on causes of crime such as
criminal offender background, this work focuses primarily crime
factors happened on every day. This work can predict the category
of crime that has a higher likelihood of occurrence in those areas
and can visualize in the form of histogram and heat map by
category of crime, crime by day of week and month. The study
depends on a lot of variables like class, latitude, longitude, etc. For
forecast, the multinomial logistic regression method is used. For
weekdays, the district and the hour of the accident are used as
predictors.This algorithm is used because its target variable has
more than two values and no ordering in the response
variable.This provides greater efficiency for handling datasets
with multi class labels. This forecast can be helpful in predicting
the occurrence of crime in vulnerable areas, which in turn
minimizes the crime rate by providing the patrol in those areas.
Keywords: Data Analytics, Prediction, Regression ,Machine
Learning.
I. INTRODUCTION
Big data analytics involves collecting data from different
resources to manipulate and then finally deliver as useful
products to the organization.
It is useful to integrate raw data acquired from different sour
ces into a data item, forms the core of Big Data Analytics.Th
ere are two methodologies in Data Analytics : Exploratory
Data Analytics (EDA) and Confirmatory Data Analytics
(CDA). EDA is an approach to analyzing data sets to
summarize their main characteristics, often with visual
methods. We can just see what the data tell us beyond the
formal modeling task. In EDA data are explored which can
provide information about the numbers of factors required to
represent the data. CDA is a multivariate statistical procedure
through which we can test how well the measured variables
represent the construct. CDA is a tool which can be useful to
confirm the measurement theory. There are five
Characteristics which is the building blocks of an efficient
data analytics solution: Accuracy, Completeness,
Consistency, Uniqueness and Timeliness. There is an another
characteristic in data analytics called as Data Visualization
which describes the presentation of abstract information in
graphical form. It allows users to spot patterns, trends, and
Revised Manuscript Received on January 05, 2020.
R. Rajadevi, Department of Information Technology, Kongu
Engineering College, Perundurai, India. E-mail: rajdevi@kongu.ac.in
E. M. Roopa Devi, Department of Information Technology, Kongu
Engineering College, Perundurai, India. E-mail: roopadevi@kongu.ac.in
S. Vinoth Kumar, Department of Information Technology, Kongu
Engineering College, Perundurai, India. E-mail: vinoths@kongu.ac.in
correlations that otherwise might go unnoticed in traditional
reports, tables, or spreadsheets. There are two basic types of
Data Visualization: Exploration and Explanation. By using
these categories we have many ways to make data can be
visual. The most common types of data visualization are Heat
map Cartogram, Choropleth,Dot Distribution Map,
Connected Scatter Plot, Polar Area Diagram, Time Series,
Pie Chart, Histogram, Scatter Plot, Dendrogram, Ring Chart,
Tree Diagram, Alluvial Diagram, Node-Link Diagram,
Matrix. A heat map is a two-dimensional representation of
data in which values are represented0020by colors. It
provides an immediate visual summary of information. It
provides easy understanding of complex data sets.
II. RELATED WORK
. The existing system deals with large set of data and it
consist of centralized database. Running Algorithm like
Multinomial Logistic Regression has higher time complexity.
It consists of 39 categories of crime but classification in
somehow difficult. System will consists of poor accuracy and
replicated values which will lead to large time consuming. As
the system is centralized and it does not distribute the task or
data, the retrieval and processing of the system consumes
large amount of time. The analysis task is more complex and
identifying error rate is difficult.
III. PROBLEM STATEMENT
The existing method is more complex for analysis and also it
has the complex structure which provides complex view to
the users. The designers of the System felt so difficult for
giving such proper working model. It has large space
complexity and time complexity. It seems to be difficult to
predict the crime and to process the data from the records of
crime. It neither displays the result in the pictorial form nor in
the comparative manner. The algorithms such as Random
Forest and Naive Bayes has higher complexity in both time as
well as space The huge data makes higher complexity in
prediction
IV. PROPOSED SYSTEM
The proposed system is to identify and visualize the occurre
nce of the crime with higher probability of critical areas. Th
ese results are used to predict crime rates in sensitive areas ..
The analysis depends on several factors such as latitude and
longitude etc.., Data are collected, classified and visualized
using graphs. Multinomial logistic regression algorithm is
used for prediction. Day of week, District and hour of the
incident are used as predictors.
Prediction of Crime Occurrence using
Multinomial Logistic Regression
R. Rajadevi, E. M. Roopa Devi, S. Vinoth Kumar