Commercial Sentiment Analysis Solutions: A Comparative Study Tatiana Ermakova 1,2,3 a , Max Henke 4 and Benjamin Fabian 1,4,5,6 b 1 Weizenbaum Institute for the Networked Society, Hardenbergstraße 32, 10623 Berlin, Germany 2 Competence Center of Electronic Safety and Security Systems for the Public and Industries (ESPRI), Fraunhofer Institute for Open Communication Systems (FOKUS), Kaiserin-Augusta-Allee 31, 10589 Berlin, Germany 3 Chair of Open Distributed Systems (ODS), Technical University of Berlin, Einsteinufer 25, 10587 Berlin, Germany 4 Hochschule für Telekommunikation Leipzig (HfTL), Gustav-Freytag-Straße 43-45, 04277 Leipzig, Germany 5 e-Government, Technical University of Applied Sciences Wildau (TH Wildau), Hochschulring 1, 15745 Wildau, Germany 6 Information Systems, Humboldt University of Berlin, Spandauer Str. 1, 10178 Berlin, Germany Keywords: Sentiment Analysis, Machine Learning, Text Classification, Commercial Service, SaaS, Cloud Computing. Abstract: Empirical insights into high-promising commercial sentiment analysis solutions that go beyond their vendors’ claims are rare. Moreover, due to ongoing advances in the field, earlier studies are far from reflecting the current situation due to the constant evolution of the field. The present research aims to evaluate and compare current solutions. Based on tweets on the airline service quality, we test the solutions of six vendors with different market power, such as Amazon, Google, IBM, Microsoft, and Lexalytics, and MeaningCloud, and report their measures of accuracy, precision, recall, (macro) F1, time performance, and service level agreements (SLA). For positive and neutral classifications, none of the solutions showed precision of over 70%. For negative classifications, all of them demonstrate high precision of around 90%, however, only IBM Watson NLU and Google Cloud Natural Language achieve recall of over 70% and thus can be seen as worth considering for application scenarios where negative text detection is a major concern. Overall, our study shows that an independent, critical experimental analysis of sentiment analysis services can provide interesting insights into their general reliability and particular classification accuracy beyond marketing claims to critically compare solutions based on real-world data and analyze potential weaknesses and margins of error before making an investment. 1 INTRODUCTION With the explosive growth of Web 2.0 applications (e.g., social media platforms), an almost continuous stream of digital, publicly available opinions is regularly generated (Liu, 2015). Sentiment analysis enables automated opinion recognition and polarity classification (Wiegand et al., 2010). Taken together, this offers organizations unprecedented opportunities to support and improve decision-making processes (Lau et al., 2012). Recent research shows that firms can leverage user-generated content in the form of sentiments to predict and/or explain various aspects of their performance, such as sales (Hu & Tripathi, 2015; Jiang et al., 2021; Z. Lin & Goh, 2011), profits (Ho et al., 2019), brand perception (Luo et al., 2017), customer satisfaction and market performance (S. a https://orcid.org/0000-0003-0864-3302 b https://orcid.org/0000-0002-9585-1814 Chung et al., 2017), and stock trade performance (Kim et al., 2017). Sentiment analysis technologies are quite challenging for companies to select, develop and/or integrate into their practices. Furthermore, training promising deep learning models requires huge amounts of rare data, training time, and resources, i.e., GPU support and large memory. Moreover, deep learning models in particular function like a black box and are difficult to understand in their sentiment predictions, while the choice of hyperparameters is essential to their performance and remains a major challenge (Yadav & Vishwakarma, 2020). The cloud computing service paradigm enables the provision of virtual machines, development tools, and software on demand (Mell & Grance, 2011). Several commercial “software as a service” (SaaS) Ermakova, T., Henke, M. and Fabian, B. Commercial Sentiment Analysis Solutions: A Comparative Study. DOI: 10.5220/0010709400003058 In Proceedings of the 17th International Conference on Web Information Systems and Technologies (WEBIST 2021), pages 103-114 ISBN: 978-989-758-536-4; ISSN: 2184-3252 Copyright c  2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 103