© 2020 JETIR November 2020, Volume 7, Issue 11 www.jetir.org (ISSN-2349-5162) JETIR2011265 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 932 SENTIMENT ANALYSIS ON BANGLA YOUTUBE COMMENTS USING MACHINE LEARNING TECHNIQUES 1 VEERANKI LAKSHMI DURGA, 2 A. MARY SOWJANYA 1 M.Tech, 2 Assistant Professor Dept of CSSE, Andhra University College Of Engineering (A), Visakhapatnam, AP, India. Abstract: Sentiment Analysis (SA) is an opinion mining study analysing people’s opinions, sentiments, evaluations and appraisals towards Societal entities such as products, services, individuals, organizations, events, etc. Of late, most of the research works on SA in natural language processing (NLP) are focused on English language. However, it is noted that Bangla language does not have a proper dataset that is both large and standard. As a result, recent research works with Bangla language in SA have fallen short to produce results that can be both comparable to works done by others in other languages and reusable for further prospective research. In this work, a substantial textual dataset of both Bangla and Romanized Bangla texts have been provided which is first of this kind and post-processed, multiple validated, and ready for SA implementation and experiments. Further, in this project scraping video information from YouTube and validate the data samples into one of three categories: positive (1), negative (0) and neutral. In this work used real-time analytics, simply means that data is analyzed right after data becomes available. Real-Time Analytics can produce insights without any delay. Keywords— Web-scraping; Bangla language; Romanized Bangla; Sentiment Analysis; Text blob I.INTRODUCTION Bangla is spoken as the first language by almost 200 million people worldwide, 160 million of whom are Bangladeshi [1]. Bangladeshi people are found to get increasingly involved in online activities such as - getting connected to friends and families through social media, expressing their opinions and thoughts on popular micro-blogging and social networking sites, sharing opinions and thoughts by means of comments on online news portals, doing online shopping through online marketplaces and other such applications. However, it is becoming increasingly harder for such businesses to monitor and analyze market trends, especially when it is done by analyzing the reaction of the customers on their products or services, due to less or no human-to-human interaction in such businesses. Moreover, the task of going through comments and reviews from each individual customers and figuring out the sentiments within is tedious and in some cases simply intractable, especially considering that - usually very high volume of data is generated very quickly in this day and age of digital connectivity. Therefore, application of automated Sentiment analysis (SA) Sentiment Analysis can play a vital role here for enhancing efficiency and productivity.SA is widely employed as a machine learning application in many areas, and is known by many other terms e.g. opinion extraction, sentiment mining, opinion mining, subjectivity analysis, emotion analysis, review mining, etc. Most of the research works found on SA are based on the English language, while Bangla SA is still at a formative stage. An interesting work by Das and Bandyopadhyay [2] on subjectivity detection included Bangla but it is not self-sufficient, as English is also needed. However, none of the works truly considered Bangladesh's perspective. We need to consider not just standardized Bangla, but Banglish (Bangla words mixed with English words) and Romanized Bangla. These three major types can again be loosely categorized in - good, standard, bad, wrong, totally wrong, particular to specific location (almost arcane), etc., depending on the level of clarity, grammatical correctness, meaningfulness, personal idiosyncrasies, impact of localization etc. Moreover, for the Romanized Bangla the added complexity is due to the variation in transliteration between people who know English well and those who do not [3]. The reason, that no clear standard is followed when 160 million Bangladeshi people write in any of the mentioned types, makes it all the more complicated and challenging to work with. In the recent past, Deep Learning methods, specifically recurrent model-based deep learning models have enjoyed a lot of success in Natural Language Processing (NLP), compared to more traditional machine learning methods [4]. While there are other approaches to SA, in this research we will concentrate exclusively on deep learning based techniques. Our key contributions cover –  A Web-scraping of YouTube Bangla and Romanized Bangla text samples, where each sample was annotated by two adult Bangla speakers.  Pre-processing the data in a way so that it is readily usable by researchers.  Application of deep recurrent models on the Bangla and Romanized Bangla text corpus.  Pre-train dataset of one label for another (and vice versa) to see if it gives better results. The paper is organized as follows. In section 2, we discussed the background of our work and the works of others in the same field that inspired and helped us in a way. In section 3, we discussed in details about the dataset that we used for our experiments. Section 4 discusses the methodology and also includes the experimental setup for the deep recurrent models. Section 5 has all the discussion about various results found from our experimentation, and lastly the article concludes with section 6.