International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2276
Hybrid Classifier for Sentiment Analysis Using Effective Pipelining
Akhil Sharma
[1]
, Aman Sharma
[2]
, Rajeev Kumar Singh
[3]
, Dr. Madhur Deo Upadhayay
[4]
1,2
Electronics and Communication Engineering, Shiv Nadar University, Uttar Pradesh, India
3
Assistant Professor, Dept. of Electrical Engineering, Shiv Nadar University, Uttar Pradesh, India
4
Assistant Professor, Dept. of Computer Science, Shiv Nadar University, Uttar Pradesh, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - A Social media has become a platform for people
to express their thoughts, opinions and ideas. Facebook,
Twitter, Google+ and the likes have emerged as data hubs for
people wanting to improve market sales, predict outcomes of
events, and characteristics of human behavior. Polling and
surveys are outdated and lengthy techniques. With opinion
mining and sentiment analysis data extraction and
classification becomes easy. In this paper, we have used a
hybrid method for analyzing sentiments. This method employs
a pipeline system consisting of rules, lexicon and machine
learning based classifier where a tweet after undergoing
preprocessing is first classified by the lexicon and the rules
classifier and is sent to the machine learning module only if
the tweet’s analysis score doesn't achieve a predetermined
threshold value. A comparison is made between the individual
- rules, lexicon, and machine learning approaches, and hybrid
classifier on the basis of F-score, recall and precision.
Key Words: opinion mining, sentiment analysis, rules-
based, lexicon-based, classifier, hybrid approach.
1. INTRODUCTION
Twitter is one of the most popular microblogging and social
networking websites [4]. People from time to time post on
Twitter, an activity called tweeting. The diversity of people
on twitter makes the tweets more versatile and valuable [7].
Therefore, Twitter becomes one of the most valuable places
to find opinions on any issue. This allows computer scientists
to perform credible sentiment analysis and develop
pathways for data mining. This data can be used in
marketing, sales or poll analysis. Timely feedback on
products can be collected by evaluating peopleǯs tweets on
Twitter [1,2,3].
Researchers can use the data sets to build unsolicited public
opinion polls on important social matters [1]. Social media
becomes a powerful tool for common public to get involved
with politics, media and business intrinsically. Polls are
expensive and time consuming [1,2]. With continued
improvement in data analysis techniques, these tasks have
become practically viable. The credibility of data and results
is higher than before. Manual surveys and polls are not
always trustable, whereas there is significantly less or
negligible scope for human errors in data mining and
subsequent analysis. Political inclinations, interests of
common public will be available for parties to understand
and prepare for their campaigns. The needs of people and
complains from the society will become accessible to
politicians. The gap between the government and public can
be bridged with ease. Predictions pertaining to elections or
major events can also be extracted in one go [1].
After any incident, protest or social unrest, people log into
social media websites to post or to make a comment in order
to express their thoughts and opinions. Social media is
powerful in terms of spreading social awareness about
crimes, diseases, and other epidemics. Twitter has become a
solid and trustable commodity not only for its users but also
researchers. The data consolidated can give great pictorial
trends regarding peopleǯs opinions. The unprecedented view
of public is displayed on social media, especially on Twitter
[1].
Sentiment analysis is a field of study to find how sentiments
and opinions are expressed in texts. Approaches that are
used to classify sentiments include - rules based, lexicon
based, machine learning and using deep learning techniques
[2,3,10,11]. The method of classifying tweets on the basis of
pre-fixed rules is called rules based approach. The approach
of using opinion words or the lexicon to determine opinion
orientations is called lexicon based approach [1,5]. Rules
based approach along with lexicon based approach has high
precision but low recall [2]. Emoticons, informal language
and abbreviations are some of the parts of unstructured
textual data that may go undetected or unclassified in the
lexicon based approach. For example DzMauritius is a grͺ
holiday destination,dz is a sentence of positive demeanor.
However, a classifier using lexicon based approach might
classify it as neutral or no. Although, it is possible to add
these expressions in the opinion lexicon, due to continuous
change in their usage, it becomes hard to classify [2].
Another method that is used for sentiment analysis is the
machine learning approach [4,8]. This method is effective for
classification of sentences and documents by training the
classifier to determine positive, negative and neutral
sentiments [4,8]. Since manual labelling of large set of tweets
is often time consuming and difficult, this approach is not
easy to implement [2]. Also, Deep Learning algorithms could
provide the most accurate results, but these techniques are
extremely computationally expensive to train. To optimize
the large amount of matrix multiplication operations that
deep learning involves, substantial investment is needed to
upgrade the IT infrastructure for more processing power.
Moreover, deep learning requires immense amount of data
to train the model as compared to traditional machine