International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 9, September 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Accurate Sentiment Analysis using Enhanced Machine Learning Models Rincy Jose 1 , Varghese S Chooralil 2 1 Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Kochi, India Abstract: Sentiment analysis is the computational study of opinions, sentiments, subjectivity, evaluations, attitudes, views and emotions expressed in text. Sentiment analysis is mainly used to classify the reviews as positive or negative or neutral with respect to a query term. This is useful for consumers who want to analyse the sentiment of products before purchase, or viewers who want to know the public sentiment about a new released movie. Here i present the results of machine learning algorithms for classifying the sentiment of movie reviews which uses a chi-squared feature selection mechanism for training. I show that machine learning algorithms such as Naive Bayes and Maximum Entropy can achieve competitive accuracy when trained using features and the publicly available dataset. It analyse accuracy, precision and recall of machine learning classification mechanisms with chi-squared feature selection technique and plot the relationship between number of features and accuracy using Naive Bayes and Maximum Entropy models. Our method also uses a negation handling as a pre-processing step in order to achieve high accuracy. Keywords: Sentiment Classification, Negation Handling, sentiment Analysis, Feature Selection 1. Introduction Sentiment analysis can be considered as the use of natural language processing, text analysis and computational linguistics to identify and extract sentiment information in source materials. Generally, sentiment analysis aims to find the attitude of a writer with respect to some relevant topic or the overall contextual polarity of a document. The main task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature level — whether the expressed opinion in a document, a sentence or a feature is positive, negative, or neutral. Document level sentiment analysis is the classification of the overall sentiments mentioned by the reviewer in the whole document text in positive, negative or neutral classes. Sentiment Classification techniques can be roughly divided into machine learning approach, lexicon based approach and hybrid approach [1]. The Machine Learning Approach (ML) applies the famous ML algorithms and uses linguistic features. The Lexicon-based Approach relies on a sentiment lexicon, a collection of known sentiment terms. It is divided into dictionary-based approach and corpus-based approach which use statistical or semantic methods to find sentiment polarity. The hybrid Approach combines both approaches. The accuracy of a sentiment analysis is based on how well it agrees with human judgments. This can be measured by using precision and recall [2]. In this paper we try to compare the accuracy of different enhanced machine learning sentiment analysis methods. They are Naïve Bayes and maximum entropy models with chi- squared feature selection technique and negation handling. Section 2 contains detailed study of these two methods. Section 3 implementation details and results. Section 4 is the conclusion. 2. Methodology Our proposed system mainly consists of three modules. They are A. Negation handling B. Feature selection C. Sentiment classification. 3.1 Negation handling Negation handling is one of the factors that contributed significantly to the accuracy of our classifier. A major problem occurring during the sentiment classification is in the negation handling. Since here we use each word as feature, the word ― good‖ in the phrase ― not good‖ will be contributing to positive sentiment rather than negative sentiment .This will leads to the errors in classification. This type of error is due to the presence of ― not‖ and this is not taken into account. To solve this problem we applied a simple algorithm for handling negations using state variables and bootstrapping. We built on the idea of using an alternate representation of negated forms[3]. This algorithm stores the negation state using a state variable. It transforms a word followed by a n’t or not into ― not_‖ + word form. Whenever the negation state variable is set, the words read are treated as ― not_‖ + word. When a punctuation mark is encountered or when there is double negation, the state variable will reset. Many words with strong sentiment occur only in their normal forms in their training set. But their negated forms would be of strong polarity. We solved this problem by adding negated forms to the opposite class along with normal forms during the training phase. That is if we encounter the word ― bad‖ in a negative document during the training phase, we increment the count of ― bad‖ in the negative class and also increment the count of ― not_bad‖ for the positive class. This is to ensure that the number of ― not_‖ forms are sufficient for classification. This modification resulted in a significant improvement (1%) in classification accuracy due to bootstrapping of negated forms during training. Paper ID: SUB157922 252