International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-9 Issue-4, February 2020 189 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: D9062019420/2020©BEIESP DOI: 10.35940/ijitee.D9062.029420 A ML and NLP based Framework for Sentiment Analysis on Bigdata D. Krishna Madhuri, R. V. V. S. V PRASAD Abstract: Big data as multiple sources and social media is one of them. Such data is rich in opinion of people and needs automated approach with Natural Language Processing (NLP) and Machine Learning (ML) to obtain and summarize social feedback. With ML as an integral part of Artificial Intelligence (AI), machines can demonstrate intelligence exhibited by humans. ML is widely used in different domains. With proliferation of Online Social Networks (OSNs), people of all walks of life exchange their views instantly. Thus they became platforms where opinions or people are available. In other words, social feedback on products and services are available. For instance, Twitter produces large volumes of such data which is of much use to enterprises to garner Business Intelligence (BI) useful to make expert decisions. In addition to the traditional feedback systems, the feedback (opinions) over social networks provide depth in the intelligence to revise strategies and policies. Sentiment analysis is the phenomenon which is employed to analyze opinions and classify them into positive, negative and neutral. Existing studies usually treated overall sentiment analysis and aspect-based sentiment analysis in isolation, and then introduce a variety of methods to analyse either overall sentiments or aspect-level sentiments, but not both. Usage of probabilistic topic model is a novel approach in sentiment analysis. In this paper, we proposed a framework for comprehensive analysis of overall and aspect-based sentiments. The framework is realized with aspect based topic modelling for sentiment analysis and ensemble learning algorithms. It also employs many ML algorithms with supervised learning approach. Benchmark datasets used in international SemEval conferences are used for empirical study. Experimental results revealed the efficiency of the proposed framework over the state of the art. Index Terms –Big data, NLP, sentiment analysis, machine learning, artificial intelligence, ensemble learning, Twitter, aspect-based sentiment analysis I. INTRODUCTION Enterprises in the real world have their data warehouse for keeping track of business data. Such data assumes characteristics of big data and provides wealth of knowledge when discovered and interpreted using data mining techniques. Such technical knowhow is invariably used by enterprises to make strategic decisions for growth. However, the business intelligence extracted from data warehouse is considered inadequate in the contemporary era where Online Social Networks (OSNs) produce voluminous data having significant latent trends. Revised Manuscript Received on February 2, 2020. D.Krishna Madhuri, Assistant Professor,Dept of CSE, GRIET, Hyderabad, Telangana India. Email: krishnamadhuri.530@gmail.com Dr. R. V. V. S. V Prasad, Professor & Head,Dept of IT, Swarandhra College of Engineering & Technology, Narsapur, India. Email: ramayanam.prasad@gmail.com Twitter [1] is one such OSN which exhibits exponential growth of tweets every year. This data is actually goldmine to researchers and enterprises when exploited by using a phenomenon, which emerged of late, known as opinion mining or sentiment analysis. Many researchers contributed to exploit data of OSN and other sources of Internet where reviews are made available. In addition to classifying sentiments into Positive, Negative and Neutral, of late, aspect based sentiment analysis is given importance [1]. Moreover, previous studies usually treat overall sentiment analysis and aspect-based sentiment analysis in isolation, and then introduce a variety of methods to analyse either overall sentiments or aspect-level sentiments, but not both. Usage of probabilistic topic model is a novel approach in sentiment analysis. Latent Dirichlet allocation (LDA) is the generative process model used for processing documents in various applications [13], [27]. In fact, it is widely used in processing online reviews or opinions over Twitter tweets as reviewed in [20], [21]. There are many supervised learning approaches, unsupervised methods [19], [22] and semi- supervised approaches [24]. Neural Networks (NNs) [7] and Convolutional Neural Networks (CNNs) [25] are also used for sentiment analysis. There are ensemble classifiers used for sentiment analysis as studied in [26]. Feature selection is found given importance in sentiment analysis. Based on syntax models and context, it is employed appropriately [29]. Topic modelling is widely used based on LDA [3], [5], [6], [12] and [30]. Along with topic modelling, aspect based approaches are found in [4], [10], [31], [34] and [39]. Aspect based sentiment analysis could provide more useful knowledge due to its utility in making decisions. Hai et al. [41] proposed a topic modelling approach for analysing sentiments. It was efficient when compared with the state of the art. However, it has the following drawbacks. It has no provision for spatio-temporal sentiment analysis of online reviews or Twitter tweets as part of semantic aspect detection and aspect-level sentiment identification. Estimating the number of latent topics for efficient probabilistic topic modelling is not included in their model. There is no provision for deep learning in their model which causes mediocre performance in sentiment analysis. In order to overcome these drawbacks, the aim of the proposed research is to develop a comprehensive framework that considers probabilistic topic modeling with both aspect level and overall sentiment analysis in sentiment identification. Our contributions in this paper are as follows. 1. Proposed a comprehensive framework that considers overall sentiment analysis and aspect based sentiment analysis with an effective training model.