International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-4, February 2020
189
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: D9062019420/2020©BEIESP
DOI: 10.35940/ijitee.D9062.029420
A ML and NLP based Framework for Sentiment
Analysis on Bigdata
D. Krishna Madhuri, R. V. V. S. V PRASAD
Abstract: Big data as multiple sources and social media is one
of them. Such data is rich in opinion of people and needs
automated approach with Natural Language Processing (NLP)
and Machine Learning (ML) to obtain and summarize social
feedback. With ML as an integral part of Artificial Intelligence
(AI), machines can demonstrate intelligence exhibited by
humans. ML is widely used in different domains. With
proliferation of Online Social Networks (OSNs), people of all
walks of life exchange their views instantly. Thus they became
platforms where opinions or people are available. In other words,
social feedback on products and services are available. For
instance, Twitter produces large volumes of such data which is of
much use to enterprises to garner Business Intelligence (BI)
useful to make expert decisions. In addition to the traditional
feedback systems, the feedback (opinions) over social networks
provide depth in the intelligence to revise strategies and policies.
Sentiment analysis is the phenomenon which is employed to
analyze opinions and classify them into positive, negative and
neutral. Existing studies usually treated overall sentiment
analysis and aspect-based sentiment analysis in isolation, and
then introduce a variety of methods to analyse either overall
sentiments or aspect-level sentiments, but not both. Usage of
probabilistic topic model is a novel approach in sentiment
analysis. In this paper, we proposed a framework for
comprehensive analysis of overall and aspect-based sentiments.
The framework is realized with aspect based topic modelling for
sentiment analysis and ensemble learning algorithms. It also
employs many ML algorithms with supervised learning approach.
Benchmark datasets used in international SemEval conferences
are used for empirical study. Experimental results revealed the
efficiency of the proposed framework over the state of the art.
Index Terms –Big data, NLP, sentiment analysis, machine
learning, artificial intelligence, ensemble learning, Twitter,
aspect-based sentiment analysis
I. INTRODUCTION
Enterprises in the real world have their data warehouse for
keeping track of business data. Such data assumes
characteristics of big data and provides wealth of knowledge
when discovered and interpreted using data mining
techniques. Such technical knowhow is invariably used by
enterprises to make strategic decisions for growth.
However, the business intelligence extracted from data
warehouse is considered inadequate in the contemporary era
where Online Social Networks (OSNs) produce voluminous
data having significant latent trends.
Revised Manuscript Received on February 2, 2020.
D.Krishna Madhuri, Assistant Professor,Dept of CSE, GRIET,
Hyderabad, Telangana India. Email: krishnamadhuri.530@gmail.com
Dr. R. V. V. S. V Prasad, Professor & Head,Dept of IT,
Swarandhra College of Engineering & Technology, Narsapur, India.
Email: ramayanam.prasad@gmail.com
Twitter [1] is one such OSN which exhibits exponential
growth of tweets every year. This data is actually goldmine
to researchers and enterprises when exploited by using a
phenomenon, which emerged of late, known as opinion
mining or sentiment analysis. Many researchers contributed
to exploit data of OSN and other sources of Internet where
reviews are made available. In addition to classifying
sentiments into Positive, Negative and Neutral, of late,
aspect based sentiment analysis is given importance [1].
Moreover, previous studies usually treat overall sentiment
analysis and aspect-based sentiment analysis in isolation,
and then introduce a variety of methods to analyse either
overall sentiments or aspect-level sentiments, but not both.
Usage of probabilistic topic model is a novel approach in
sentiment analysis.
Latent Dirichlet allocation (LDA) is the generative process
model used for processing documents in various
applications [13], [27]. In fact, it is widely used in
processing online reviews or opinions over Twitter tweets as
reviewed in [20], [21]. There are many supervised learning
approaches, unsupervised methods [19], [22] and semi-
supervised approaches [24]. Neural Networks (NNs) [7] and
Convolutional Neural Networks (CNNs) [25] are also used
for sentiment analysis. There are ensemble classifiers used
for sentiment analysis as studied in [26]. Feature selection is
found given importance in sentiment analysis. Based on
syntax models and context, it is employed appropriately
[29]. Topic modelling is widely used based on LDA [3], [5],
[6], [12] and [30]. Along with topic modelling, aspect based
approaches are found in [4], [10], [31], [34] and [39].
Aspect based sentiment analysis could provide more useful
knowledge due to its utility in making decisions.
Hai et al. [41] proposed a topic modelling approach for
analysing sentiments. It was efficient when compared with
the state of the art. However, it has the following drawbacks.
It has no provision for spatio-temporal sentiment analysis of
online reviews or Twitter tweets as part of semantic aspect
detection and aspect-level sentiment identification.
Estimating the number of latent topics for efficient
probabilistic topic modelling is not included in their model.
There is no provision for deep learning in their model which
causes mediocre performance in sentiment analysis. In order
to overcome these drawbacks, the aim of the proposed
research is to develop a comprehensive framework that
considers probabilistic topic modeling with both aspect level
and overall sentiment analysis in sentiment identification.
Our contributions in this paper are as follows.
1. Proposed a comprehensive framework that considers
overall sentiment analysis and aspect based sentiment
analysis with an effective training model.