Normalization of Term Weighting Scheme for Sentiment Analysis Alexander Pak, Patrick Paroubek, Amel Fraisse, and Gil Francopoulo Universit´ e Paris-Sud 11, Laboratoire LIMSI-CNRS, Bˆ atiment 508, F-91405 Orsay Cedex, France alex.pak@blabla.fr, {pap, fraisse}@limsi.fr, Gil@.fr Abstract. The n-gram model with a binary (or tf-idf) weighting scheme and an SVM classifier is a common approach which is used as a baseline in a lot of re- search on sentiment analysis and opinion mining. Other advanced methods are used on top of this model to improve the classification accuracy, such as genera- tion of additional features or using supplementary linguistic resources. In this pa- per, we show how a simple technique can improve both the overall classification accuracy and the classification of minor reviews by normalizing the terms weights in the basic bag-of-words method. Other systems may benefit from this method if they are based on the n-gram model. We have tested our approach on the movie review and the product review datasets and show that our normalization technique enhances the classification accuracy of the traditional weighting schemes. In this paper, we work on English, however the applied technique should be considered language independent since it does not use any language specific ressource ex- cept a training corpus. Though, the question remains whether we would observe similar performance increases for other language families. 1 Introduction The increase of the interest in sentiment analysis is usually associated with the appear- ance of web-blogs and social networks, where users post and share information about their likes/dislikes, preferences, and lifestyle. Many websites provide an opportunity for users to leave their opinion on a given object or a topic. For example, the users of IMDb 1 website can write a review on a movie they have watched and rate it on 5-star scale. As a result, given a large number of reviews and rating scores, the IMDb reflects general opinions of Internet users on movies. Many other movie-related resources, such as cinema schedule websites, use the information from the IMDb to provide informa- tion about the movies including the average rating. Thus, the users who write reviews on IMDb influence the choice of other users, who will have a tendency to select movies with higher ratings. Another example is social networks. It is popular among users of Twitter 2 or Face- book 3 to post messages that are visible to their friends with an opinion on different 1 The Internet Movie Database: http://imdb.com 2 http://twitter.com 3 http://facebook.com