In or Out? Real-Time Monitoring of BREXIT sentiment on Twitter Laurentiu Vasiliu Peracton Ltd. Dublin Ireland laurentiu.vasiliu@peracton.com André Freitas, Frederico Caroli, Siegfried Handschuh University of Passau Germany {first.last}@uni-passau.de Ross McDermott, Manel Zarrouk, Manuela Hürlimann, Brian Davis, Tobias Daudert, Malek Ben Khaled, David Byrne Insight Centre for Data Analytics National University of Ireland Galway, Ireland {first.last}@insight-centre.org Sergio Fernández Redlink GmbH Salzburg Austria sergio.fernandez@redlink.co Angelo Cavallini 3rdPlace S.r.l. Milan Italy angelo.cavallini@3rdplace.com ABSTRACT The SSIX (Social Sentiment analysis financial IndeXes) project is a European Innovation Project sponsored by the European Commission under the Horizon 2020 framework. SSIX aims to provide European SMEs with a collection of easy to interpret tools to analyse and understand social media sentiment for any given topic regardless of locale or language. The United Kingdom’s recent referendum on European Union membership i.e. staying (“Bremain”) or leaving the EU (“Brexit”) was selected for the initial real-world test case for the validating the SSIX methodology and platform. In this paper, we describe the SSIX architecture in brief as well as analysis of the platforms X-Scores metrics and their application to Brexit, our initial experimental results and lessons learned. CCS Concepts Computing methodologiesArtificial intelligenceNatural language processingInformation extraction. Computing methodologiesMachine learningLearning paradigmsSupervised learning by classification. Keywords SSIX; Brexit; Natural Language Processing; Machine Learning; Opinion Mining; Twitter; Sentiment Analysis; Political Opinion Mining. 1. INTRODUCTION The SSIX (Social Sentiment analysis financial IndeXes) project 1 is European Innovation Project sponsored by European Commission under the Horizon 2020 framework. SSIX aims to provide European SMEs with a collection of easy to interpret tools to analyse and understand social media users opinion for any given topic regardless of locale or language. The SSIX platform interprets significant sentiment signals in social media conversations producing sentiment metrics, such as sentiment dynamics, sentiment volatility and sentiment momentum. 1 http://ssix-project.eu/ The recent United Kingdom European Union membership referendum on staying (“Bremain”) or leaving the EU (“Brexit”) was chosen as a first real-world test case for the SSIX consortium [1]. The goal was to stress test the SSIX platform and the methodology we have employed in order to infer opinion/sentiment from social networks. Furthermore, we employed the analysis of a set of rolling metrics called X-Scores, such as the raw aggregated sentiment, volumes, rolling averages and non-standard technical oscillators such as relative strength index (RSI) to examine their value for providing insights into sentiment behaviour. These initials tests enabled us to examine for the first time the SSIX platform in a real world scenario and provided extremely valuable feedback about both the behaviour of the technology we have employed for it and our fundamental assumptions on extracting sentiment data from social networks, which will be for various use cases, primarily for decision- making. 2. ASSUMPTIONS AND SSIX ARCHITECTURE As originally foreseen, the SSIX project aims to cover the most important social networks such as Facebook, Twitter and LinkedIn. For the Brexit exercise, we started with Twitter only due to technical accessibility reasons. We note that Twitter users will not overlap exactly with the voting demographics in the UK but only a portion of it [2]. Moreover, it was not easy to identify what constitutes ‘overlap’ since many users do not disclose publicly their location of tweeting or residence. However, we attempt to curtail this by, capturing English messages only. Overall, 40% of all activity can be said to come from geographical Europe (this includes GMT etc. time zones which cannot be attributed to a single country), while 18% comes from outside Europe. For 42% it was not possible to determine their location because the time zone is not set. Next, we present the location and percentage of sentiment expressed on those locations from Twitter users for some European 2 countries. This data represents only 33% (2.3 million Tweets out of a total of 5.9 million) from the entire data collection. Note, not all users enable their location data so it was not possible to capture this information fully. 2 European here has the geographical meaning, EU and non-EU. © 2016 Copyright held by the author/owner(s). SEMANTICS 2016: Posters and Demos Track September 13-14, 2016, Leipzig, Germany