International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 734
Survey of Classification of Business Reviews Using Sentiment Analysis
Shilpa A. Shendre
1
, Prof. Pramila M Chawan
2
1
M.Tech Student, Dept of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India
2
Associate Professor, Dept of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
ABSTRACT:- The rapid increase in mountains of unstructured
textual data accompanied by the proliferation of tools to
analyze them has opened up great opportunities and
challenges for text research. The research area of sentiment
analysis has gained popularity in the last years. Business
developers not only want to know about there product
marketing and profit based on the number of sales been done
but also want to know about the reviews and thoughts of
people using these products. The feedback they receive via
social media and other internet services becomes very
important to measure the quality of a product they are serving.
Sentiment analysis is a domain where the analysis is focused on
the extraction of feedback and opinions of the users towards a
particular topic from a structured, or unstructured textual
data. In this paper, we try to focus our effort on sentiment
analysis on Yelp challenge database. We examine the
sentiment expression to classify the reviews of the business
whether it is positive or negative and perform the feature
extraction and use these features for updating and
maintenance of the business.
Key Words: sentiment analysis; opinion mining;
classification; text reviews, Machine learning
1. INTRODUCTION
Sentiment analysis has become an important research area
for understanding people’s opinion on a matter by
differentiating a huge amount of information. The present
era of the Internet has become a huge Cyber Database which
hosts the gigantic amount of data which is created and
consumed by the users. People across the world share their
views about various services or products using social
networking sites, blogs or popular reviews sites. The Internet
is been growing at an exponential rate giving rise to
communicate across the globe in which people express their
views on social media such as Facebook, Twitter, Rotten
Tomatoes and Foursquare. Opinions which are being
expressed in the form of reviews provide a platform for new
explorations to find collective reviews of people. One such
domain of reviews is the domain of business reviews which
affects business people. The feedback from the customer is
valuable for companies to analyze their customer’s
satisfaction and survey the competitors. This is also useful
for other people or consumers who want to buy a product or
a service prior to making a purchase.
In this paper, we are going to present the results of machine
algorithms for classifying reviews using semantic analysis. A
large number of customer-generated reviews for businesses
and service providers are classified as either positive or
negative. We propose a method to automatically classify
customer sentiments using only business text review. This
helps us to generate the result using feedback without
manual intervention. By studying only rating, it is very
difficult to judge why the user has rated the product as 1 or 5
stars. However, the text content contains a more quantitative
value for analyzing more than rating itself.
In this paper, we are going to mention the preprocessing
steps require in order to achieve accuracy in the
classification task. There is no previous research available on
classifying sentiment of business review using the latest
reviews forms yelp dataset. Determining the underlying
sentiment of business review is a difficult task taking into
account several factors such as the connotation of a word
depending on the context, language used, words ambiguity
when using words that don’t express a particular sentiment
or when using sarcasm. We show that a sentiment analysis
algorithm built on top of machine learning algorithms such
as Naïve Bayes and Linear Support Vector Classification
(SVC) has accuracy above 90% business reviews.
1.1 Feature Selection
Mostly the researchers apply standard feature selection in
there approach to improve performance with few using more
practical approaches. We are focusing completely on feature
election to improve sentiment analysis are few. One of them
is the famous Pang & Lee, who removed objective sentences
on a tested consisting of objective and subjective text trained
on SVM. Initially, they found that sentiment classification
result is actually slow and moderate. They then concluded it
was more likely that sentences adjacent to discarded
sentences improved classification result over their baseline.
1.2 Information Gain
Another work used sophisticated feature selection and found
that using either information gain (IG) or genetic algorithm
(GA) results in an improvement inaccuracy. Let D be a
dataset of labeled texts. Let pD represent the probability that
a random text D is classified as positive. The classification
should be fairly simple if the text is majorly biased towards
positive or negative instances. On the contrary, if the set is
very unevenly distributed with equal likelihood of positive
and negative instances, then the task is difficult. The disorder
in the set D is calculated by its entropy: