AN EMPIRICAL STUDY ON SENTIMENT POLARITY CLASSIFICATION OF BOOK REVIEWS F AIZA SHAHZADI 1 AND TEHSEEN ZIA 2 1 Department of Computer Science& Information Technology University of Sargodha Fz_noor@yahoo.com 2 Department of Computer Science& Information Technology University of Sargodha Tehseen.zia@uos.edu.pk ment polarity classification deals with automatic classification of text in sentiment polarity categories. While in most of proposed approaches for polarity classification, a dictionary containing polarity-based terms is considered. Such dictionaries are not readily available. We have adopted a machine learning based approach where classifiers are trained over a self-collected corpus of book reviews, annotated with sentimental categories. In this paper, we have presented our investigation of performance evaluation of machine learning classifiers. Five classifiers are evaluated including naïve Bayes, k-nearest neighborer, decision tree and support vector machine. Naïve Bayes has shown us best results. Keywords:: Opinion Mining; Text Categorization; Sentiment Analysis; Sentiment Classification; Book Review. 1. Introduction. Text categorization (TC) is a popular methodology to process textual data [1]. Based on contents of text, TC methods assign text to predefine categories. With the interest of automatically indexing, organizing, summarizing and searching of enormous amount of online textual data, TC is getting popularity in information retrieval (IR). While TC deals with topic-based classification of text, opinion mining is another domain of IR that process text on non topic basis. For example, detection and extraction of opinions, feelings and emotions related to a specific subject are common tasks of opinion mining. Distinguishing expressions (such as positive, negative or neutral) within text is a subtask of opinion mining particularly known as sentiment polarity classification or identification [2]. In most of proposed polarity classification approaches, polarity related terms (such as wonderful, terrific, beautiful and bad, etc) are considered to classify sentiment polarity. These solutions though works well, the availability of dictionary of polarity-related terms is a key requirement. It has two issues with it: firstly, identifying such polarity-related terms and building a dictionary is a challenging task (such dictionaries are not readily available). Secondly, due to its reliance over specific collection of terms, it’s not easy to adopt these methods. Therefore, we have analyzed the suitability of TC approach for sentiment polarity classification of text. It is more convenient since it’s easy to gather opinions and annotate them with polarity categories rather building a dictionary of polarity-related terms. Secondly, it has convenience of easy adoptability. The prototype of sentiment polarity system has broader scope for the end users who are often interested to get more information about the book before purchasing: most of the websites maintain the opinions of people who already have comments on given books. These opinions can be utilized as a valuables source of 1 ABSTRACT. Sent VFAST Transactions on Software Engineering http://vfast.org/journals/index.php/VTSE@ 2014 ISSN(e): 2309-3978;ISSN(p): 2411-6246 Volume 2, Number 1, January-December, 2014 pp. 01-07