ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)
Vol. 4, Issue 12, December 2017
6
All Rights Reserved © 2017 IJARTET
Large-scale Sentiment Analysis Using Hadoop
Dasari Prasad
1
, G.N.V.G. Sirisha
2
, G. Mahesh
3
, G.V. Padma Raju
4
P.G Student, CSE Department, SRKR Engineering College, Bhimavaram, India
1
Assistant Professor, CSE Department, SRKR Engineering College, Bhimavaram, India
2
Associate Professor, CSE Department, SRKR Engineering College, Bhimavaram, India
3
Professor, CSE Department, SRKR Engineering College, Bhimavaram, India
4
Email: {prasad.dasari1126
1
, sirishagadiraju
2
, gadirajumahesh
3
, gvpadmaraju
4
}@gmail.com
Abstract: Sentiment analysis involves the usage of text analytics to identify and categorize the polarity of opinions expressed in a
piece of text. Sentiment analysis analyzes the intension of a customer from a given feedback text. Supervised machine learning
techniques are one of the popular methods for sentiment analysis. The accuracy of the algorithms increases with the increase in size
of training data. Large volumes of user reviews are available online, to leverage them Hadoop-based Sentiment Analysis system is
proposed in this paper. The proposed system applies Naïve Bayesian Classifier for detecting the polarity of users’ opinions. The
system achieved 94% accuracy and 86% accuracy when tested on two datasets namely product review dataset and movie review
dataset. These accuracies even without applying pre-processing steps like Parts of Speech tagging.
Keywords: sentiment analysis, Hadoop, supervised machine learning, opinions, text analytics, naïve Bayesian
I. INTRODUCTION
Sentiment analysis helps in identifying the writer’s
attitude towards an individual, organization, event, product
or topic is positive, negative or neutral. World Wide Web
has become a part of everyone’s life. More and more people
are using www for a number of tasks like information
retrieval, e-commerce, social networking etc. it has enabled
companies, organizations, political parties to easily reach
their customers and citizens through online advertising,
online campaigning. People are also using blogs, social
networking sites to share their opinions with friends, family
and society at large.
Compared to traditional media, broadcasting and
narrowcasting are much easier and cost effective with World
Wide Web. This is only one side of the coin; the other side
of coin is that the fake news, negative opinions are also
spreading at a faster rate through www. People are sharing
their dislikes, dissatisfaction, and anger about a product,
event, individual or party through blog posts, reviews and
social media. If actions like immediate dialogue with
dissatisfied customers,
Compensation to product errors are taken the spread of
negative sentiment can be reduced. Organizations and
governments are resorting to find the success of
product/scheme by analysing customers or citizen’s response
in social media networks, reviews, and tweets etc. so,
Analysing the sentiment polarity of a piece of text has
become the need of the day. From data granularity point of
View, there are three levels of sentiment analysis; document
level, sentence level and aspect (feature) level. There is not
much difference between document level and sentence level
sentiment analysis as sentences are nothing but short
documents as in [1]. Sometimes, the writer expresses
different views about different aspects (features) of same
product.
Human beings are social beings and it is natural for us to
ask the opinion of others before we make any choice. Earlier
generally people used to take the opinions of friends, family
but with the availability large volumes of opinion data online
in the form of reviews, ratings etc. we now rely on them. So,
it is also necessary to automatically identify the polarity of
large review datasets and consolidate them so that it
becomes easy for the users to use them.
A number of tools and techniques were developed in the
area of sentiment analysis over the past decade. Some of the
examples for tools are face book insights, red opal etc. as in
[3]. Sentiment analysis methods are broadly classified into
machine learning methods, lexicon based methods and
hybrid methods as in [2]. Machine learning methods in turn
classified as supervised and unsupervised method. Lexicon
based approaches are classified as dictionary based approach
and corpus based approaches. Hybrid approaches uses both
machine learning and lexicon based method. Among these