Sentiment Lexicon Creation from Lexical Resources Bas Heerschop, Alexander Hogenboom, and Flavius Frasincar Erasmus University Rotterdam PO Box 1738, NL-3000 DR, Rotterdam, The Netherlands basheerschop@gmail.com, {hogenboom,frasincar}@ese.eur.nl Abstract. Today’s business information systems face the challenge of analyzing sentiment in massive data sets for supporting, e.g., reputation management. Many approaches rely on lexical resources containing words and their associated sentiment. We perform a corpus-based evaluation of several automated methods for creating such lexicons, exploiting vast lexical resources. We consider propagating the sentiment of a seed set of words through semantic relations or through PageRank-based similari- ties. We also consider a machine learning approach using an ensemble of classifiers. The latter approach turns out to outperform the others. However, PageRank-based propagation appears to yield a more robust sentiment classifier. Keywords: sentiment analysis, sentiment lexicon creation, sentiment propagation, page rank, machine learning. 1 Introduction Sentiment analysis, also referred to as opinion mining, encompasses a broad area of natural language processing, computational linguistics, and text mining. In general, the aim is to determine the attitude of the author with respect to the subject of the text, which is typically quantified in a polarity. Recent develop- ments on the Web – enabling users to produce an ever-growing amount of virtual utterances of opinions or sentiment through, e.g., messages on Twitter, blogs, or on-line reviews – advocate an array of possibilities for business information systems. Mining sentiment in the vast amount of data on the Web has many interesting applications, such as in the analysis of on-line customer reviews, rep- utation management, or marketing. Proper tools for sentiment mining can enable businesses to monitor the public sentiment with respect to particular products or brands, which can yield invaluable input for their marketing strategies. In recent work, we assessed the state-of-the-art in sentiment analysis [1]. We showed that many approaches essentially rely on a lexicon containing words or phrases and their associated sentiment scores. Such lexicons often need to be created first. Automated methods include supervised learning on a set of manu- ally rated documents and learning through related word expansion – expanding a small, manually created set of words by exploiting word relationships such W. Abramowicz (Ed.): BIS 2011, LNBIP 87, pp. 185–196, 2011. c Springer-Verlag Berlin Heidelberg 2011