2020 IEEE International Conference on Big Data (Big Data) 978-1-7281-6251-5/20/$31.00 ©2020 IEEE 3099 Interpretation of Sentiment Analysis with Human-in-the-Loop Vijaya Kumari Yeruva Dept. of CSEE University of Missouri-Kansas City Kansas City, USA vyq4b@mail.umkc.edu Mayanka Chandrashekar Dept. of CSEE University of Missouri-Kansas City Kansas City, USA mckw9@mail.umkc.edu Yugyung Lee Dept. of CSEE University of Missouri-Kansas City Kansas City, USA leeyu@umkc.edu Jeff Rydberg-Cox Dept. of English University of Missouri-Kansas City Kansas City, USA rydbergcoxj@umkc.edu Virginia Blanton Dept. of English University of Missouri-Kansas City Kansas City, USA blantonv@umkc.edu Nathan A Oyler Dept. of Chemistry University of Missouri-Kansas City Kansas City, USA oylern@umkc.edu Abstract—Human-in-the-Loop has been receiving special attention from the data science and machine learning community. It is essential to realize the advantages of human feedback and the pressing need for manual annotation to improve machine learning performance. Recent advancements in natural language processing (NLP) and machine learning have created unique challenges and opportunities for digital humanities research. In particular, there are ample opportunities for NLP and machine learning researchers to analyze data from literary texts and use these complex source texts to broaden our understanding of human sentiment using the human-in-the-loop approach. This paper presents our understanding of how human annotators differ from machine annotators in sentiment analysis tasks and how these differences can contribute to designing systems for the “human in the loop" sentiment analysis in complex, unstructured texts. We further explore the challenges and benefits of the human-machine collaboration for sentiment analysis using a case study in Greek tragedy and address some open questions about collaborative annotation for sentiments in literary texts. We focus primarily on (i) an analysis of the challenges in sentiment analysis tasks for humans and machines, and (ii) whether consistent annotation results are generated from multiple human annotators and multiple machine annotators. For human annotators, we have used a survey-based approach with about 60 college students. We have selected six popular sentiment analysis tools for machine annotators, including VADER, CoreNLP’s sentiment annotator, TextBlob, LIME, Glove+LSTM, and RoBERTa. We have conducted a qualitative and quantitative evaluation with the human-in-the-loop approach and confirmed our observations on sentiment tasks using the Greek tragedy case study. Index Terms—Human-in-the-loop, Natural Language Processing (NLP), Sentiment Analysis, Greek tragedy, Machine, and Human Annotations, Interactive Machine Learning I. I NTRODUCTION Human-in-the-Loop has been receiving special attention from the data science and machine learning community. [1], [2] It is essential to realize the advantages of human feedback and the pressing need for manual annotation to improve machine learning performance. The emergence of these human-in-the-loop methodologies has created interesting new opportunities for digital humanities research. In particular, there are ample opportunities for NLP and machine learning researchers to analyze data from literary texts and use these complex sources to broaden our understanding of human sentiment using the human-in-the-loop approach. Storytelling and literary texts are built around formal genre conventions that are essential for effective communication [3]. This emphasizes the importance of understanding the conventions of literature and its broader cultural contexts. Human-in-the-loop workflows allow us to iteratively take feedback from humans that factor in these considerations to improve the ability to understand a broader range of texts with computational tools. Recent advancements in NLP and deep learning, such as Glove+LSTM [4], and RoBERTa [5] have created opportunities to integrate human annotation with machine learning for digital humanities research. These tools make it possible to conduct a systematic analysis of sentiments and emotions in large collections of unstructured texts. Our study compares the results from social media-trained sentiment analysis packages with those provided by human annotators. This comparison provides well-annotated data that can be used to improve the computational models for sentiment analysis and emotion detection tasks, which are complicated and hard for machines without human input. For this study, our work has focused on two primary research questions: RQ1: What is the level of agreement between multiple human and machine annotators when evaluating sentiment? If the agreement is low, what are the reasons behind it? RQ2: What are the primary topics associated with the sentiment identified by either humans or machines and 2020 IEEE International Conference on Big Data (Big Data) | 978-1-7281-6251-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/BigData50022.2020.9378221 Authorized licensed use limited to: University of Missouri-Kansas City. Downloaded on March 16,2023 at 13:49:04 UTC from IEEE Xplore. Restrictions apply.