Automatic Annotation Performance of TextBlob and VADER on Covid Vaccination Dataset Badriya Murdhi Alenzi, Muhammad Badruddin Khan, Mozaherul Hoque Abul Hasanat, Abdul Khader Jilani Saudagar * , Mohammed AlKhathami and Abdullah AlTameem Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia *Corresponding Author: Abdul Khader Jilani Saudagar. Email: aksaudagar@imamu.edu.sa Received: 07 December 2021; Accepted: 20 January 2022 Abstract: With the recent boom in the corpus size of sentiment analysis tasks, automatic annotation is poised to be a necessary alternative to manual annotation for generating ground truth dataset labels. This article aims to investigate and vali- date the performance of two widely used lexicon-based automatic annotation approaches, TextBlob and Valence Aware Dictionary and Sentiment Reasoner (VADER), by comparing them with manual annotation. The dataset of 5402 Ara- bic tweets was annotated manually, containing 3124 positive tweets, 1463 nega- tive tweets, and 815 neutral tweets. The tweets were translated into English so that TextBlob and VADER could be used for their annotation. TextBlob and VADER automatically classiﬁed the tweets to positive, negative, and neutral sentiments and compared them with manual annotation. This study shows that automatic annotation cannot be trusted as the gold standard for annotation. In addition, the study discussed many drawbacks and limitations of automatic annotation using lexicon-based algorithms. The highest level of accuracies of 75% and 70% were achieved by TextBlob and VADER, respectively. Keywords: Sentiment analysis; lexicon-based approach; VADER; TextBlob; automatic annotation 1 Introduction Over the past two decades, sentiment analysis or opinion mining has evolved to be a valuable tool in understanding people’ s emotions with a wide range of usage in various ﬁelds such as public health, marketing, sociology, and politics. Creating a ground truth dataset by annotating the data with appropriate sentiment labels indicating positive, negative, and neutral emotion is essential for any sentiment analysis work that uses a supervised learning approach. Traditionally, in supervised sentiment analysis or in general, in any supervised machine learning (ML) approach, dataset annotations are performed by human experts of the respective domain. In sentiment analysis, manual annotations are considered the most accurate reﬂection of human emotions expressed in any natural language corpus. Hence, manual annotations are the “gold standard” in any sentiment analysis task [1]. The idea that human expert This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Intelligent Automation & Soft Computing DOI: 10.32604/iasc.2022.025861 Article ech T Press Science