Information Processing and Management 52 (2016) 20–35
Contents lists available at ScienceDirect
Information Processing and Management
journal homepage: www.elsevier.com/locate/ipm
Expressive signals in social media languages to improve polarity
detection
E. Fersini
∗
, E. Messina, F.A. Pozzi
DISCo, University of Milano-Bicocca, Viale Sarca, 336 – 20126 Milano, Italy
article info
Article history:
Received 21 May 2014
Revised 7 April 2015
Accepted 11 April 2015
Available online 12 June 2015
Keywords:
Sentiment analysis
Polarity detection
Expressive signals
abstract
Social media represents an emerging challenging sector where the natural language expres-
sions of people can be easily reported through blogs and short text messages. This is rapidly
creating unique contents of massive dimensions that need to be efficiently and effectively an-
alyzed to create actionable knowledge for decision making processes. A key information that
can be grasped from social environments relates to the polarity of text messages. To better
capture the sentiment orientation of the messages, several valuable expressive forms could
be taken into account. In this paper, three expressive signals – typically used in microblogs –
have been explored: (1) adjectives, (2) emoticon, emphatic and onomatopoeic expressions
and (3) expressive lengthening. Once a text message has been normalized to better conform
social media posts to a canonical language, the considered expressive signals have been used
to enrich the feature space and train several baseline and ensemble classifiers aimed at polar-
ity classification. The experimental results show that adjectives are more discriminative and
impacting than the other considered expressive signals.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
The goal of sentiment analysis is to define automatic tools able to extract subjective information, such as opinions and sen-
timents from natural language texts, in order to create structured and actionable knowledge to be used by either a decision
support system or a decision maker. This issue is usually addressed at document level (Yessenalina et al., 2010), in which the
naive assumption is that each document expresses an overall sentiment. When dealing with social media contents coming from
microblogs (like Facebook and Twitter), a lower granularity level could be more useful and informative (Jagtap and Pawar, 2013;
Zhang et al., 2011). This new kind of virtual communication has led to new types of contents and diffusion models that need to
be modeled explicitly starting from the language. The characteristics that distinguish well-formed contents (e.g. reviews) from
microblogs messages relate to the use of canonical, coherent and at least paragraph-length pieces of text. However, sentiment
analysis on social media leads towards new and more complex scenarios: the sentiment is conveyed in at most two sentence
passages often with an informal linguistic register and with non-standard spelling (Eisenstein, 2013). These novel scenarios lead
researchers to move from a traditional approach, which solves the sentiment analysis task by using machine learning models
(Pang and Lee, 2008), to a communication-oriented paradigm.
The first expressive signals that have been considered in the literature to aid the detection of sentiment in a given
message are concerned with lexical elements (e.g., adjectives, verbs, adverbs). Pak and Paroubek (2010) investigated the
∗
Corresponding author.
E-mail addresses: fersini@disco.unimib.it (E. Fersini), messina@disco.unimib.it (E. Messina), federico.pozzi@disco.unimib.it (F.A. Pozzi).
http://dx.doi.org/10.1016/j.ipm.2015.04.004
0306-4573/© 2015 Elsevier B.V. All rights reserved.