International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1471
Sentiment Analysis and Classification of Tweets Using Data Mining
Md Shoeb
1
, Jawed Ahmed
2
1
Student, Department of computer science, Hamdard University, New Delhi, India
2
Assistant Professor, Department of computer science, Hamdard University, New Delhi, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - These days, Social networking sites like twitter,
facebook, etc. are the great source of communication for
internet users. So these become an important source for
understanding the opinions, views or emotions of people. In
this paper, we use data mining techniques for the purpose of
classification to perform sentiment analysis on the views
people have shared on Twitter, which is one of the most used
social networking sites nowadays. We collect dataset, i.e.
tweets from Twitter and apply text mining techniques –
transformation, tokenization, stemming etc to convert them
into a useful form and then use it for building sentiment
classifier. Rapid Miner tool is being used, that helps in building
the classifier. Here, we are using three different classifiers on
the data and then compare the results to find which one gives
better accuracy and better results.
Key Words: Rapid Miner, Classification, data mining,
sentiment analysis
1. INTRODUCTION
In recent times, people are using social networking sites like
twitter, facebook, blogs for expressing their sentiments,
views, feedbacks, opinions etc. and the opinions of other
people have always been important to us in many ways. So,
there comes a need to analyze their views and sentiments.
Sentiment Analysis is the implementation of natural
language processing, text analytics, and computational
linguistics that assists in recognizing and extracting the
useful information from the source matter[1]. It aims to
ascertain the point of view of a speaker or a writer towards
any topic or incident by analyzing their comments on social
networking sites. Data mining also called knowledge
discovery in databases that means the complete process of
discovering the beneficial knowledge from data. It is the
process of obtaining attractive and serviceable designs and
relationships in large volumes of data[2].
Data classification is the process of classifying the data into
some categories for its most efficacious and productive use.
The goal of the classification is to predict the target class
accurately for each and every case in the data. An algorithm
that specially used to implements classification is known as a
classifier. The term "classifier" sometimes also refers to the
mathematical function, that is implemented by a
classification algorithm.
Text mining is the analysis of the data being used to extract
the useful data from. It is used to process textual information
and extract meaningful data from the text. Generally, some
natural language processing or information retrieval
methods or some pre-processing of text is done in order to
make it useful for applying data mining algorithms.
In this work, we are using three different classifiers to
extract the thoughts and sentiments of the people, they share
on twitter through their tweets and classify them into
different categories. And compare the results to find out
which classifier gives the best result in terms of better
precision and recall ratios and accuracy.
2. RELATED WORKS
This section contains a review of the work previously done
in the field of sentiment analysis for the live data. A lot of
work has been carried out till date in this field for the data
from the users on social media in order to extract the
sentiments of people towards any topic, products, trend etc.
The studies focus on extracting useful information from the
natural language of users and process it to get the real
sentiments from the language. Osaimi and Badruddin[3]
have done a lot of work on the sentiment analysis of the
tweets on the twitter in the Arabic language. In this, they
build different classifiers by training them with a proper
dataset and then analyzed the accuracy and result of these
classifiers in order to predict the correct sentiments. Pragya
Tripathi, Santosh Kr Vishwakarma, Ajay Lala[4] have
proposed the work on the sentiment analysis of English
tweets using rapid minor. They collect the dataset from the
twitter that is in natural language and applies the techniques
of text mining and use it to build the sentiment classifier.
O’Keefe et al.[ͷ] have proposed a technique to select the
features attributes weight and applied two classifiers on it
i.e. Naïve Bayes and SVM. In this work, the author obtained
classification accuracy of 87.15% by using only 29% of the
selected attributes. Pak and Paroubek [6] have also worked
in this field. The author used the data of Twitter to perform
linguistic analysis and then build a classifier that is highly
efficient. Pang and Lee[7] presented the broad overview of
the existing work done by Pak and Paroubek. The authors
describe the existing techniques and approaches for an
information retrieval, in their survey.
K.Bhuvaneswari and R. Parimala[8] have proposed in their
work, a method for sentiment classification using
correlation-based feature selection. They applied different
data pre-processing techniques, then used a correlation
attribute method for feature selection, and then finally two
classifiers namely Naïve Bayes and Support Vector Machine
are implemented and results were evaluated. Farhan Laeeq,
Md. Tabrez Nafis and Mirza Rahil Beg[9] have proposed a
work on sentiment classification of social media. In their