International Journal of Electrical and Computer Engineering (IJECE) Vol. 13, No. 6, December 2023, pp. 6913~6925 ISSN: 2088-8708, DOI: 10.11591/ijece.v13i6.pp6913-6925 6913 Journal homepage: http://ijece.iaescore.com A large-scale sentiment analysis using political tweets Yin Min Tun, Myo Khaing Faculty of Computer Science, University of Computer Studies, Mandalay, Myanmar Article Info ABSTRACT Article history: Received Aug 19, 2022 Revised Apr 26, 2023 Accepted Jun 26, 2023 Twitter has become a key element of political discourse in candidates’ campaigns. The political polarization on Twitter is vital to politicians as it is a popular public medium to analyze and predict public opinion concerning political events. The analysis of the sentiment of political tweet contents mainly depends on the quality of sentiment lexicons. Therefore, it is crucial to create sentiment lexicons of the highest quality. In the proposed system, the domain-specific of the political lexicon is constructed by using the supervised approach to extract extreme political opinions words, and features in tweets. Political multi-class sentiment analysis (PMSA) system on the big data platform is developed to predict the inclination of tweets to infer the results of the elections by conducting the analysis on different political datasets: including the Trump election dataset and the BBC News politics. The comparative analysis is the experimental results which are better political text classification by using the three different models (multinomial naïve Bayes (MNB), decision tree (DT), linear support vector classification (SVC)). In the comparison of three different models, linear SVC has the better performance than the other two techniques. The analytical evaluation results show that the proposed system can be performed with 98% accuracy in linear SVC. Keywords: Apache flume Big data analytic Machine learning Sentiment analysis Apache Social media data Spark This is an open access article under the CC BY-SA license. Corresponding Author: Yin Min Tun Faculty of Computer Science, University of Computer Studies Mandalay, Myanmar Email: yinmintun@ucsm.edu.mm 1. INTRODUCTION Social media have become more essential and Twitter plays a vital role in campaigning during election time. Twitter is one of the most common and popular social media that give the freedom for people to share their opinions, thoughts, and beliefs in the world. Twitter is increasingly used by politicians, journalists, political strategists, and citizens as a large part of the network for the discussion of public issues. Governments and politicians always detect the social media network and amendments, and how people are responding to different policies, and acts. Some political scientists working with Google, Facebook, or precise large datasets may have to know about big data architecture and new distributed methods with the huge data sets. Political scientists can focus more on new software for data cleaning, data management, reproducible science, data lifecycle management, and data visualization. In the era of big data, data is collected from various sources, such as mobile devices and web browsers, and stored in various data formats. It cannot handle the traditional storage and analytics platform from various structured and unstructured data. Hadoop, a good platform for big data analytics, offers scalability, cost-efficiency, parallel processing, availability, flexibility, and fast and secure authentication. An open-source framework Hadoop comprises a storage part called Hadoop distributed file system (HDFS) and a processing part called MapReduce. Sentiment analysis (SA), one of the big data applications focuses on analyzing big data in various ways, and