I.J. Modern Education and Computer Science, 2018, 5, 54-62
Published Online May 2018 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijmecs.2018.05.07
Copyright © 2018 MECS I.J. Modern Education and Computer Science, 2018, 5, 54-62
Efficient Feature Extraction in Sentiment
Classification for Contrastive Sentences
Sonu Lal Gupta
Gautam Buddha University, Greater Noida, India-201308
Email: sonugupta2006@gmail.com
Anurag Singh Baghel
Gautam Buddha University, Greater Noida, India-201308
Email: anuragsbaghel@gmail.com
Received: 29 January 2018; Accepted: 10 April 2018; Published: 08 May 2018
Abstract—Sentiment Classification is a special task of
Sentiments Analysis in which a text document is assigned
into some category like positive, negative, and neutral on
the basis of some subjective information contained in
documents. This subjective information called as
sentiment features are highly responsible for efficient
sentiment classification. Thus, Feature extraction is
essentially an important task for sentiment classification
at any level. This study explores most relevant and
crucial features for sentiment classification and groups
them into seven categories, named as, Basic features,
Seed word features, TF-IDF, Punctuation based features,
Sentence based features, N-grams, and POS lexicons.
This paper proposes two new sentence based features
which are helpful in assigning the overall sentiment of
contrastive sentences and on the basis of proposed
features; two algorithms are developed to find the
sentiment of contrastive sentences. The dataset of
TripAdvisor is used to evaluate our proposed features.
Obtained results are compared with several state-of-the-
art studies using various features on the same dataset and
achieve superior performance.
Index Terms—Sentiment analysis, Sentiment
classification, Contrastive sentences, Review subjectivity,
Polarity detection, Machine learning, Lexicon.
I. INTRODUCTION
The Web is a pool of online information which consists
of text data i.e. facts and reviews or opinions about those
facts. Facts are objective sentences which are based on
proof and do not have any sentiments while opinions are
subjective sentences which brief about different
sentiments of different people towards entities.
Processing the opinions is commonly known as
sentiments analysis which has attained a high popularity
in the last decade because of the rise in social media. It
aims to determine the attitude of a person on the web in
terms of some topics or overall opinion for a document.
Sentiment classification is such a task which labels
various documents into categories like positive
sentiments, negative sentiments or neutral as per opinion
information contains in the documents [1-2]. Sentiment
classification may be broadly categorized into levels
namely document level, sentence level, and finally
aspect/feature level [3-6]. Document level specifies the
document polarity as positive or negative considering the
document as a single unit, while sentence level considers
the whole sentence and expresses the sentiment. The
aspect level analysis first identifies the entities and
further opinions about those entities.
Every text consists of certain features which express
the sentiment of the text. Features may express
sentiments implicitly or explicitly. For classification of
sentiments, a feature is nothing but a piece of sensible
information from the text which could be a word or a
combination of many words or a full sentence which
brings up the definition of the polarity of the text in terms
of the positive, negative or neutral review. Feature
extraction is a necessary step in sentiment classification
to extract the most representative features which are
helpful in distinguishing classes [7]. Almost every text
contains enormous features out of which around seventy
percent features are irrelevant and creates noise. The
main purpose of feature extraction is to find as many
relevant features which could speed up the process of
classification of data. The more accurate is the extraction
of features the more accurate will be the sentiment
analysis.
Sentiments are not always expressed explicitly in
sentences. Like in sentence “how can anyone purchase
this item?” sentiment is negative but implicit and has no
words which contains sentiments directly. The polarity of
many words highly depends on the domain. Their polarity
cannot be fixed and changes from domain to domain.
Presence of sarcasm and negative sentences are also a big
threat for accurate sentiment classification. Similarly,
contrastive sentences are the major challenge to find the
overall sentiment of the sentence. In this research, we are
proposing two novel algorithms to assign the overall
sentiment of contrastive sentences.
Sentiment analysis techniques are broadly categorized
into three techniques: machine learning based, lexicons