International Journal of Scientific & Engineering Research Volume 8, Issue 6, June-2017 1155
ISSN 2229-5518
IJSER © 2017
http://www.ijser.org
Building Sentiment analysis Model using
Graphlab
First Mona Mohamed Nasr, Second Essam Mohamed Shaaban, and Third Ahmed Mostafa Hafez
Abstract —Sentiment analysis is called opinion mining which is the field of study that analyzes people’s opinions, sentiments, evaluations,
appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their
attributes. Starting from the importance of the sentiment analysis generally for individuals and more specifically for gigantic organizations, we
started digging in this paper. Graphlab was used to build the sentiment models. Many algorithms were used along with text features selection
techniques to predict the positive and negative sentiments like “SVM”, “logistic regression” and “boosted trees”. The mentioned classifiers
were applied to a Hotel reviews dataset got from Trip Advisor website to emulate real customer opinions. The results showed that using SVM
classifier along with N-grams features selection technique was superior to others.
Keywords—Classification, Feature Selection, Support Vector Machine (SVM), Logistic Regression, Decision trees.
——————————
——————————
1 INTRODUCTION
He revolution of social media, e.g.(reviews, forum
discussions, blogs, microblogs, Twitter, and social
networks)makes it easy to know the reviews of any
product. Hence the need for analyzing sentiments (reviews)
has emerged.Sentiment analysis, also called opinion
mining, is the field of study that analyzes people’s
opinions, sentiments, evaluations, appraisals, attitudes, and
emotions towards entities such as products, services,
organizations, individuals, issues, events, topics, and their
attributes [1].In recent years many researchers built
sentiment models to analyze product reviews and classify
them to positive and negative sentiments. Ortigosa et
al[2]proposed a hybrid approach that combines lexical-
based and machine-learning techniques. The results
showed that it is feasible to perform sentiment analysis in
Facebook with high accuracy (83.27%).Parkhe and
Biswas[3] focused on aspect-based sentiment analysis of
movie reviews in order to find out the aspect specific
driving factors. These factors are the score given to various
movie aspects and generally, aspects with high driving
factors direct the polarity of the review the most. They
depend on Lexicons, POS, A Naïve Bayes and SVM
classifier. The results showed that by giving high driving
factors to Movie, Acting and Plot aspects of a movie, we
obtained the highest accuracy in the analysis of movie
reviews about 79.372%.Nagamma et al[4]applied sentiment
analysis for studying the relationship between the online
reviews for a movie and the movies box office revenue
performance. They useda hybrid approach that combines
Term Frequency (TF) and Inverse Document Frequency
(IDF) values as features along with Fuzzy Clustering and
Support Vector Machine (SVM) Classifier for predicting the
trend of the box office revenue from the review sentiment.
The results showed that using reviews based on clustering
has helped to show an improvement in the accuracy from
62% to 89.65% on SVM classifier with and without
clustering. While using NB classifier gave an accuracy of
72.41% under both conditions.Hegde & Padma[5]applied a
case study of Kannada SA for mobile product reviews .they
used a lexicon-based method for aspect extraction.
Furthermore, the Naive Bayes classification model is
applied to analyze the polarity of the sentiment due to its
computational simplicity and stochastic robustness.
Therefore, a customized corpus has been developed. Their
preliminary results indicate that this approach is an
efficient Technique performed with 65 % accuracy for
Kannada SA.
In this paper sentiment model was built by using SVM,
Decision trees, and Logistic Regression depending on Hotel
reviews dataset crawled from Trip Advisorafter applying
some modification and transformation from web form to
CSV form. All models were built by using IPython
Notebook with Graphlab module and SFrame package. The
results show that the Sentiment Model-based SVM with N-
grams features is superior toothers.
2 IMPLEMENTATION PACKAGE
During the implementation phase; Ipython notebook
with GraphlabCreate are used to scale much larger data
than other available resources like Pandas.
T
IJSER