1 D. Madhavi, R. Piryani and V. K. Singh Madhavi Devaraj is with the Department of Computer Science and Engineering, Uttar Pradesh Technical University, Lucknow-226021 India. (e-mail: madhavidevaraj@gmail.com). Rajesh Piryani is with the Department of Computer Science, South Asian University, New Delhi-110021 India (e-mail: rajesh.piryani@gmail.com). Vivek Kumar Singh is with the Department of Computer Science, South Asian University, New Delhi-110021, India (corresponding author, phone: +91-11- 24195148, +91-9971995005; fax: +91-11-24122511; e-mail: vivekks12@gmail.com). ABSTRACT This paper presents our experimental work towards detecting sentiment polarity of free form texts: first by using an ensemble of sentiment lexicons and then through a lexicon pooled machine learning classifier. In the ensemble design, we combined four different sentiment lexicons in different ways to determine sentiment polarities of different text data. The ensemble approach, however, did not achieve superior performance as initially thought. Therefore, in the second design we tried to pool the sentiment lexicon knowledge into the machine learning classification process itself of a multinomial Naïve Bayes classifier. The experimental designs are evaluated on three document and two sentence datasets. The lexicon pooled approach obtains superior accuracy levels as compared to standard Naïve Bayes classifier as well as Lexicon-based methods. Further, as the amount of training data decreases, the accuracy levels of lexicon pooled machine learning classifier decays slowly as compared to standalone Naïve Bayes classifier. The framework presented proves useful and robust and can be extended to any classification task. Keywords: Ensemble, Lexicon-pooling, Naïve Bayes, Opinion Mining, Sentiment Analysis. 1. INTRODUCTION Sentiment polarity detection is a language processing task that uses an algorithmic formulation to categorize an opinionated text into either ‘positive’ or ‘negative’ sentiment classes (or sometimes a ‘neutral’ class equivalent to having no opinion polarity). It is part of the broader task of sentiment analysis which is formally defined as an approach that works on a quintuple <Oi, Fij, Skijl, Hk, Tl>; where, Oi is the target object, Fij is a feature of the object Oi, Skijl is the sentiment polarity (+ve, -ve or neutral) of opinion of holder k on j th feature of object i at time l, and Tl is the time when the opinion is expressed [1]. Sentiment analysis is now a very useful task across a wide variety of domains. Whether it is commercial exploitation by organizations for identifying customer attitudes/ opinions about products/ services, or identifying the election prospect of political candidates; sentiment analysis finds its applications. The huge amount of user created information on the new participative World Wide Web is of immense potential to companies, which try to know the feedback about their products or services, as well as to analysts who use it for different predictive purposes. These feedbacks help all of them in taking informed decisions. However, the large number of reviews will pose information overload in absence of automated methods for computing their sentiment polarities. Sentiment analysis fills this gap by producing a sentiment profile computed from a large number of user reviews about products, services etc. During recent years, attempts have been made to carry out sentiment analysis task at different levels of detail, such as document- level (assigning a polarity to the whole document), sentence-level (assigning polarity label to each sentence in a text) or aspect/feature-level (identifying positive and negative aspects in a text). A ‘positive’ label denotes that the concerned document/ sentence express an overall positive opinion whereas a ‘negative’ label means that it expresses an overall negative opinion of the user. Sometimes the degree/ strength of positivity or negativity is also computed. There are broadly two kinds of approaches for sentiment analysis: those based on machine learning classifiers and those based on sentiment lexicons. The machine learning classifiers are usually a kind of supervised machine learning paradigm that uses training on labeled data before they can be applied to the actual sentiment classification task. In the past, varieties of machine learning classifiers have been used for sentiment analysis, such as Naive Bayes, Support Vector Machine and Maximum Entropy classifiers. The lexicon-based methods, on the other hand, do not require a prior training and employ a sentiment dictionary for computing sentiment polarity of a text. Both of these approaches have their own advantages and disadvantages In this paper, we report our experimental work on two different algorithmic approaches for detection of sentiment polarity from unstructured free-form texts. First, we implemented an ensemble of sentiment lexicons for the purpose of sentiment polarity detection by combining knowledge from different sentiment lexicons. The idea is to combine advantages of different lexicons. Lexicon Ensemble and Lexicon Pooling for Sentiment Polarity Detection