2016 International Conference on Computational Systems and Information Systems for Sustainable Solutions
An Unsupervised Fuzzy Clustering Method for
Twitter Sentiment Analysis
Hima Suesh
Research Scholar, School of Computer Sciences
Mahatma Gandhi University
Kottayam, India
Abstract- Cluster based techniques on sentiment analysis is a
novel approach for analyzing sentiments expressed in social
media sites. It is a main task of exploratory data mining, and a
common technique used in machine learning. In contrast to
supervised learning technique, the cluster based techniques
produce essentially accurate experimental results without manual
processing, linguistic knowledge or training time. This paper
presents a novel fuzzy clustering model to analyze twitter feeds
regarding the sentiments of a particular brand using the real
dataset collected over a period of one year. Then a comparative
analysis is made with the existing partitioning clustering
techniques namely K Means and Expectation Maximization
algorithms based on metrics namely accuracy, precision, recall
and execution time. According to the experimental analysis, the
proposed approach is tested to be practicable in performing high
quality twitter sentiment analysis results.
Kords- Sentiment Analysi (SA); partitioning clustering
techniques; Expectation Maimization (EM); Simple K- Means.
I. INTRODUCTION
Twitter is one of the most popular micro-blogging web
sites in which users could send and receive a short 140
character messages called tweets. It provides means for
empirical analyzing properties of interactions with people. The
Information extracted fom twitter could be the opinions
relating to different topics such as politics, brand impact,
election etc. With the advent of machine learning techniques;
decision makers could ensure effcient solutions for a plethora
of problems.
Many researchers have attempted to fnd out a technique to
automatically analyze the sentiment orientation of documents
especially fom reviews, blogs etc. This could be categorized
into two diferent machine learning techniques such as (i)
Supervised machine learning technique and (ii) Unsupervised
machine learning technique. In spite of the fact that
Supervised machine learning technique enjoys a relatively
high effciency compared to Unsupervised machine learning
technique, its processing requires manual participation.
Unsupervised machine learning on the other hand, do not
demand manual involvement but its accuracy could be limited.
This paper focuses on the twitter sentiment analysis of a
brand using Partition based clustering techniques.
Dr. Gladston Raj. S
Head, Department of Computer Science
Govt. College, Nedumangadu
Trivandrum, India
Partition based clustering technique is an Unsupervised
machine learning technique. Two such techniques (K Means
and EM) are analyzed with the proposed method regarding the
brand information collected fom tweets and observed that the
proposed fzzy clustering method provides better results based
on the aspect of accuracy and execution time over the other
two partition based techniques.
Main contributions in our work include collection of real
data sets of 300 samples of tweets fom Twitter API over a
period of one year fom 2015 to 2016 regarding the particular
brand called Samsung Galaxy S6. Then a modifed fzzy
clustering method has been proposed and attempted a
comparative analysis with the existing partition based methods
namely K means and Expectation Maximization methods.
Te remainder of this paper is organized as follows: In
Section II. Related works are discussed. Section III presents
Methodology. Experimental analysis and results are described
in Section IV and conclusion and fture work is discussed in
Section V.
II. RELATED WORKS
Tis section discusses related works in the specifc area of
twitter sentiment analysis.
Masashi et al [I] proposed aspect identifcation method
for analyzing sentiments in review documents. Tey applied
non-tagged data and clustering approach to solve the problem
of the number of training data classifying similar sentences
into clusters frst then the aspects of sentences that are close to
the centroid of each cluster were tagged. They identifed the
aspect of sentences in test data using SVM with 73.9%.
Shahana et al [2] presented selected features fom high
dimensionality of feature set using feature selection techniques
such as information gain, TF-idf Chi-square and mutual
information. The methods were evaluated over movie review
dataset fom websites. The performance was evaluated using
SVM and Weka tool. They proved that unigam using
stemming with stop words give high accuracy.
Deepa et al [3] performed aspect based sentiment analysis
on movie reviews. The aspect as well as sentiment detection
using clustering, review guided clustering and manual labeling
978-1-5090-1022-6/16/$31.00 ©2016 IEEE 80