ISSN 2394-3777 (Print) ISSN 2394-3785 (Online) Available online at www.ijartet.com International Journal of Advanced Research Trends in Engineering and Technology (IJARTET) Vol. 4, Issue 12, December 2017 6 All Rights Reserved © 2017 IJARTET Large-scale Sentiment Analysis Using Hadoop Dasari Prasad 1 , G.N.V.G. Sirisha 2 , G. Mahesh 3 , G.V. Padma Raju 4 P.G Student, CSE Department, SRKR Engineering College, Bhimavaram, India 1 Assistant Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 2 Associate Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 3 Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 4 Email: {prasad.dasari1126 1 , sirishagadiraju 2 , gadirajumahesh 3 , gvpadmaraju 4 }@gmail.com Abstract: Sentiment analysis involves the usage of text analytics to identify and categorize the polarity of opinions expressed in a piece of text. Sentiment analysis analyzes the intension of a customer from a given feedback text. Supervised machine learning techniques are one of the popular methods for sentiment analysis. The accuracy of the algorithms increases with the increase in size of training data. Large volumes of user reviews are available online, to leverage them Hadoop-based Sentiment Analysis system is proposed in this paper. The proposed system applies Naïve Bayesian Classifier for detecting the polarity of users’ opinions. The system achieved 94% accuracy and 86% accuracy when tested on two datasets namely product review dataset and movie review dataset. These accuracies even without applying pre-processing steps like Parts of Speech tagging. Keywords: sentiment analysis, Hadoop, supervised machine learning, opinions, text analytics, naïve Bayesian I. INTRODUCTION Sentiment analysis helps in identifying the writer’s attitude towards an individual, organization, event, product or topic is positive, negative or neutral. World Wide Web has become a part of everyone’s life. More and more people are using www for a number of tasks like information retrieval, e-commerce, social networking etc. it has enabled companies, organizations, political parties to easily reach their customers and citizens through online advertising, online campaigning. People are also using blogs, social networking sites to share their opinions with friends, family and society at large. Compared to traditional media, broadcasting and narrowcasting are much easier and cost effective with World Wide Web. This is only one side of the coin; the other side of coin is that the fake news, negative opinions are also spreading at a faster rate through www. People are sharing their dislikes, dissatisfaction, and anger about a product, event, individual or party through blog posts, reviews and social media. If actions like immediate dialogue with dissatisfied customers, Compensation to product errors are taken the spread of negative sentiment can be reduced. Organizations and governments are resorting to find the success of product/scheme by analysing customers or citizen’s response in social media networks, reviews, and tweets etc. so, Analysing the sentiment polarity of a piece of text has become the need of the day. From data granularity point of View, there are three levels of sentiment analysis; document level, sentence level and aspect (feature) level. There is not much difference between document level and sentence level sentiment analysis as sentences are nothing but short documents as in [1]. Sometimes, the writer expresses different views about different aspects (features) of same product. Human beings are social beings and it is natural for us to ask the opinion of others before we make any choice. Earlier generally people used to take the opinions of friends, family but with the availability large volumes of opinion data online in the form of reviews, ratings etc. we now rely on them. So, it is also necessary to automatically identify the polarity of large review datasets and consolidate them so that it becomes easy for the users to use them. A number of tools and techniques were developed in the area of sentiment analysis over the past decade. Some of the examples for tools are face book insights, red opal etc. as in [3]. Sentiment analysis methods are broadly classified into machine learning methods, lexicon based methods and hybrid methods as in [2]. Machine learning methods in turn classified as supervised and unsupervised method. Lexicon based approaches are classified as dictionary based approach and corpus based approaches. Hybrid approaches uses both machine learning and lexicon based method. Among these