International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 827
Analysis of Multiple Classification algorithms using Real Time
Twitter Data
Sharmila Bhargava
1
, Dr. Rachna Dubey
2
, Priyanka Gupta
3
1
M.Tech Scholar, Department of Computer Science & Engineering, Lakshmi Narain College of Technology&
Excellence Bhopal (M.P), India.
2
Professor & Head, Department of CSE, Lakshmi Narain College of Technology& Excellence, Bhopal (M.P), India.
3
Assistant Professor, Department of CSE, Lakshmi Narain College of Technology& Excellence, Bhopal (M.P), India.
---------------------------------------------------------------------***----------------------------------------------------------------------
ABSTRACT:- The spreading amount of data usually
generates interesting demand for the data analysis tools
that spot regularities in these data. Data mining has turned
up as great domain that contributes mechanism for data
analysis, to find out the hidden knowledge, and self-ruling
decision making in many operation domains. Supervised
machine learning is using to find out the search for
algorithms that reason from clearly supplied instances to
produce general interpretation, which then makes
predictions about future scenario or events. In other
words, the goal of supervised learning is to make a small
model of the distribution of class labels (distribution or
classification) in terms of finding (predictor) features. The
resulting classifier is then used to assign class labels
(attributes) to the testing instances where the values of the
predictor (attributes or properties) features are known,
but the value of the class label is unknown. This paper
explains various supervised machine learning classification
techniques.
In this paper, we have discussed the about the
classification algorithm which are available today, how
they works, and what are their advantages and
disadvantages. The algorithms which we will discuss are
Naïve Bayes, SVM, random forest, decision tree and logistic
regression.
Keywords: Classification, Naïve Bayes, Random forest,
Multiple regression dependent variable, independent
variables, predictor variable, response variable
I. INTRODUCTION
Social Predictive modelling can be explained in terms of
mathematics to find out the goal there is relationship
between a target, response, or “dependent” variable and
various predictor (attributes) or “independent” variables
with the goal in mind of measuring future (attributes)
values of those predictors and inserting them into the
mathematical relationship to predict future values of the
target variable, it is mandatory to give some measure of
mistrust for the predictions, typically a prediction interval
that gives some assigned level of confidence like some
percentage value i.e (95%).Regression analysis establishes
a relationship between a dependent o r outcome variable
and a set of predictors. Regression, as a data mining
technique, is supervised learning. Supervised learning
partitions the database into training and validation data.
The techniques used in this research were simple linear
regression and multiple linear regressions. Some
divergence between the uses of regression in statistics
verses data mining are: in statistics the data is a sample
from a (Data Storage) population, but in Data Mining the
data is taken from a large database (e.g. 1 million records).
Also in stats the regression model is created from a given
sample, but in Data Mining the regression model is created
from a part of the data (training data). Predictive (To
guess) analytics enclose a number of mechanism from
stats, data mining and game theory that find out current
and historical facts to make guesses about future events.
The variety of techniques is sometimes divided in three
ways: predictive models, descriptive models and decision
models.
Predictive models explain for sure relationships and
some patterns that usually edge to a certain behaviour,
point to fraud, predict system failures, and so many. By
explaining the explanatory variables, we can find out or
predict results in the dependent variables.
Descriptive models explain for creating partition or
segment; generally it is used to classify (find out)
customers based on for instance (behaviour of customers
in different locations) socio-demographic characteristics,
life cycle, profits, required product and many more. Where
predictive models focus on a specific (individual) event or
behaviour, descriptive models identify as many different
(general) relationships as possible.
Decision models explain to find out optimization ways to
predict (find out) results of decisions. This branch of
predictive analytics apply in operations research, including
areas such as resource optimization in networking & many
places, route planning in many industries .
1.1 Data Mining Techniques
It is used for decision making in a business is very poor
even though data storage grows exponentially. Data mining
also known as Knowledge discovery or finding some
important information in some scenario. The Knowledge
extracted allows predicting the behaviour and future
behaviour. This allows the business owners to take
positive, knowledge driven decisions. Data mining is