I.J. Modern Education and Computer Science, 2018, 5, 54-62 Published Online May 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijmecs.2018.05.07 Copyright © 2018 MECS I.J. Modern Education and Computer Science, 2018, 5, 54-62 Efficient Feature Extraction in Sentiment Classification for Contrastive Sentences Sonu Lal Gupta Gautam Buddha University, Greater Noida, India-201308 Email: sonugupta2006@gmail.com Anurag Singh Baghel Gautam Buddha University, Greater Noida, India-201308 Email: anuragsbaghel@gmail.com Received: 29 January 2018; Accepted: 10 April 2018; Published: 08 May 2018 Abstract—Sentiment Classification is a special task of Sentiments Analysis in which a text document is assigned into some category like positive, negative, and neutral on the basis of some subjective information contained in documents. This subjective information called as sentiment features are highly responsible for efficient sentiment classification. Thus, Feature extraction is essentially an important task for sentiment classification at any level. This study explores most relevant and crucial features for sentiment classification and groups them into seven categories, named as, Basic features, Seed word features, TF-IDF, Punctuation based features, Sentence based features, N-grams, and POS lexicons. This paper proposes two new sentence based features which are helpful in assigning the overall sentiment of contrastive sentences and on the basis of proposed features; two algorithms are developed to find the sentiment of contrastive sentences. The dataset of TripAdvisor is used to evaluate our proposed features. Obtained results are compared with several state-of-the- art studies using various features on the same dataset and achieve superior performance. Index Terms—Sentiment analysis, Sentiment classification, Contrastive sentences, Review subjectivity, Polarity detection, Machine learning, Lexicon. I. INTRODUCTION The Web is a pool of online information which consists of text data i.e. facts and reviews or opinions about those facts. Facts are objective sentences which are based on proof and do not have any sentiments while opinions are subjective sentences which brief about different sentiments of different people towards entities. Processing the opinions is commonly known as sentiments analysis which has attained a high popularity in the last decade because of the rise in social media. It aims to determine the attitude of a person on the web in terms of some topics or overall opinion for a document. Sentiment classification is such a task which labels various documents into categories like positive sentiments, negative sentiments or neutral as per opinion information contains in the documents [1-2]. Sentiment classification may be broadly categorized into levels namely document level, sentence level, and finally aspect/feature level [3-6]. Document level specifies the document polarity as positive or negative considering the document as a single unit, while sentence level considers the whole sentence and expresses the sentiment. The aspect level analysis first identifies the entities and further opinions about those entities. Every text consists of certain features which express the sentiment of the text. Features may express sentiments implicitly or explicitly. For classification of sentiments, a feature is nothing but a piece of sensible information from the text which could be a word or a combination of many words or a full sentence which brings up the definition of the polarity of the text in terms of the positive, negative or neutral review. Feature extraction is a necessary step in sentiment classification to extract the most representative features which are helpful in distinguishing classes [7]. Almost every text contains enormous features out of which around seventy percent features are irrelevant and creates noise. The main purpose of feature extraction is to find as many relevant features which could speed up the process of classification of data. The more accurate is the extraction of features the more accurate will be the sentiment analysis. Sentiments are not always expressed explicitly in sentences. Like in sentence “how can anyone purchase this item?” sentiment is negative but implicit and has no words which contains sentiments directly. The polarity of many words highly depends on the domain. Their polarity cannot be fixed and changes from domain to domain. Presence of sarcasm and negative sentences are also a big threat for accurate sentiment classification. Similarly, contrastive sentences are the major challenge to find the overall sentiment of the sentence. In this research, we are proposing two novel algorithms to assign the overall sentiment of contrastive sentences. Sentiment analysis techniques are broadly categorized into three techniques: machine learning based, lexicons