2016 International Conference on Computational Systems and Information Systems for Sustainable Solutions An Unsupervised Fuzzy Clustering Method for Twitter Sentiment Analysis Hima Suesh Research Scholar, School of Computer Sciences Mahatma Gandhi University Kottayam, India Abstract- Cluster based techniques on sentiment analysis is a novel approach for analyzing sentiments expressed in social media sites. It is a main task of exploratory data mining, and a common technique used in machine learning. In contrast to supervised learning technique, the cluster based techniques produce essentially accurate experimental results without manual processing, linguistic knowledge or training time. This paper presents a novel fuzzy clustering model to analyze twitter feeds regarding the sentiments of a particular brand using the real dataset collected over a period of one year. Then a comparative analysis is made with the existing partitioning clustering techniques namely K Means and Expectation Maximization algorithms based on metrics namely accuracy, precision, recall and execution time. According to the experimental analysis, the proposed approach is tested to be practicable in performing high quality twitter sentiment analysis results. Kords- Sentiment Analysi (SA); partitioning clustering techniques; Expectation Maimization (EM); Simple K- Means. I. INTRODUCTION Twitter is one of the most popular micro-blogging web sites in which users could send and receive a short 140 character messages called tweets. It provides means for empirical analyzing properties of interactions with people. The Information extracted fom twitter could be the opinions relating to different topics such as politics, brand impact, election etc. With the advent of machine learning techniques; decision makers could ensure effcient solutions for a plethora of problems. Many researchers have attempted to fnd out a technique to automatically analyze the sentiment orientation of documents especially fom reviews, blogs etc. This could be categorized into two diferent machine learning techniques such as (i) Supervised machine learning technique and (ii) Unsupervised machine learning technique. In spite of the fact that Supervised machine learning technique enjoys a relatively high effciency compared to Unsupervised machine learning technique, its processing requires manual participation. Unsupervised machine learning on the other hand, do not demand manual involvement but its accuracy could be limited. This paper focuses on the twitter sentiment analysis of a brand using Partition based clustering techniques. Dr. Gladston Raj. S Head, Department of Computer Science Govt. College, Nedumangadu Trivandrum, India Partition based clustering technique is an Unsupervised machine learning technique. Two such techniques (K Means and EM) are analyzed with the proposed method regarding the brand information collected fom tweets and observed that the proposed fzzy clustering method provides better results based on the aspect of accuracy and execution time over the other two partition based techniques. Main contributions in our work include collection of real data sets of 300 samples of tweets fom Twitter API over a period of one year fom 2015 to 2016 regarding the particular brand called Samsung Galaxy S6. Then a modifed fzzy clustering method has been proposed and attempted a comparative analysis with the existing partition based methods namely K means and Expectation Maximization methods. Te remainder of this paper is organized as follows: In Section II. Related works are discussed. Section III presents Methodology. Experimental analysis and results are described in Section IV and conclusion and fture work is discussed in Section V. II. RELATED WORKS Tis section discusses related works in the specifc area of twitter sentiment analysis. Masashi et al [I] proposed aspect identifcation method for analyzing sentiments in review documents. Tey applied non-tagged data and clustering approach to solve the problem of the number of training data classifying similar sentences into clusters frst then the aspects of sentences that are close to the centroid of each cluster were tagged. They identifed the aspect of sentences in test data using SVM with 73.9%. Shahana et al [2] presented selected features fom high dimensionality of feature set using feature selection techniques such as information gain, TF-idf Chi-square and mutual information. The methods were evaluated over movie review dataset fom websites. The performance was evaluated using SVM and Weka tool. They proved that unigam using stemming with stop words give high accuracy. Deepa et al [3] performed aspect based sentiment analysis on movie reviews. The aspect as well as sentiment detection using clustering, review guided clustering and manual labeling 978-1-5090-1022-6/16/$31.00 ©2016 IEEE 80