Information Processing and Management 52 (2016) 20–35 Contents lists available at ScienceDirect Information Processing and Management journal homepage: www.elsevier.com/locate/ipm Expressive signals in social media languages to improve polarity detection E. Fersini ∗ , E. Messina, F.A. Pozzi DISCo, University of Milano-Bicocca, Viale Sarca, 336 – 20126 Milano, Italy article info Article history: Received 21 May 2014 Revised 7 April 2015 Accepted 11 April 2015 Available online 12 June 2015 Keywords: Sentiment analysis Polarity detection Expressive signals abstract Social media represents an emerging challenging sector where the natural language expres- sions of people can be easily reported through blogs and short text messages. This is rapidly creating unique contents of massive dimensions that need to be eﬃciently and effectively an- alyzed to create actionable knowledge for decision making processes. A key information that can be grasped from social environments relates to the polarity of text messages. To better capture the sentiment orientation of the messages, several valuable expressive forms could be taken into account. In this paper, three expressive signals – typically used in microblogs – have been explored: (1) adjectives, (2) emoticon, emphatic and onomatopoeic expressions and (3) expressive lengthening. Once a text message has been normalized to better conform social media posts to a canonical language, the considered expressive signals have been used to enrich the feature space and train several baseline and ensemble classiﬁers aimed at polar- ity classiﬁcation. The experimental results show that adjectives are more discriminative and impacting than the other considered expressive signals. © 2015 Elsevier B.V. All rights reserved. 1. Introduction The goal of sentiment analysis is to deﬁne automatic tools able to extract subjective information, such as opinions and sen- timents from natural language texts, in order to create structured and actionable knowledge to be used by either a decision support system or a decision maker. This issue is usually addressed at document level (Yessenalina et al., 2010), in which the naive assumption is that each document expresses an overall sentiment. When dealing with social media contents coming from microblogs (like Facebook and Twitter), a lower granularity level could be more useful and informative (Jagtap and Pawar, 2013; Zhang et al., 2011). This new kind of virtual communication has led to new types of contents and diffusion models that need to be modeled explicitly starting from the language. The characteristics that distinguish well-formed contents (e.g. reviews) from microblogs messages relate to the use of canonical, coherent and at least paragraph-length pieces of text. However, sentiment analysis on social media leads towards new and more complex scenarios: the sentiment is conveyed in at most two sentence passages often with an informal linguistic register and with non-standard spelling (Eisenstein, 2013). These novel scenarios lead researchers to move from a traditional approach, which solves the sentiment analysis task by using machine learning models (Pang and Lee, 2008), to a communication-oriented paradigm. The ﬁrst expressive signals that have been considered in the literature to aid the detection of sentiment in a given message are concerned with lexical elements (e.g., adjectives, verbs, adverbs). Pak and Paroubek (2010) investigated the ∗ Corresponding author. E-mail addresses: fersini@disco.unimib.it (E. Fersini), messina@disco.unimib.it (E. Messina), federico.pozzi@disco.unimib.it (F.A. Pozzi). http://dx.doi.org/10.1016/j.ipm.2015.04.004 0306-4573/© 2015 Elsevier B.V. All rights reserved.