International Journal on Data Science and Technology 2016; 2(4): 41-45 http://www.sciencepublishinggroup.com/j/ijdst doi: 10.11648/j.ijdst.20160204.11 ISSN: 2472-2200 (Print); ISSN: 2472-2235 (Online) Comparative Twitter Sentiment Analysis Based on Linear and Probabilistic Models Kiplagat Wilfred Kiprono, Elisha Odira Abade School of computing and informatics, University of Nairobi, Nairobi, Kenya Email address: Wilkiprono@gmail.com (K. W. Kiprono), eabade@uonbi.ac.ke (E. O. Abade), elisha.abade@gmail.com (E. O. Abade) To cite this article: Kiplagat Wilfred Kiprono, Elisha Odira Abade. Comparative Twitter Sentiment Analysis Based on Linear and Probabilistic Models. International Journal on Data Science and Technology. Vol. 2, No. 4, 2016, pp. 41-45. doi: 10.11648/j.ijdst.20160204.11 Received: June 12, 2016; Accepted: June 23, 2016; Published: August 1, 2016 Abstract: The transition from web 1.0 to web 2.0 has enabled direct interaction between users and its various resources and services such as social media networks. In this research paper we have analyzed algorithms for sentiment analysis which can be used to utilize this huge information. The goals of this paper is to device a way of obtaining social network opinions and extracting features from unstructured text and assign for each feature its associated sentiment in a clear and efficient way. In this project we have applied naïve bayes, support vector machines and maximum entropy for analysis and produced an analytical report of the three qualitatively and quantitatively. We performed the project empirically and analyzed the resulting data using an excel tool so as to obtain comparative analysis of the three algorithms for classification. Keywords: Pos, Svm, Maxent, Naive Bayes, Feature Selection, Sentiment Classification, N-grams, Bigrams, Unigrams, Trigrams 1. Introduction Direct interaction in the web and the environment has led to the availability of huge information in the internet. Social media networks such as tweeter, facebook, linkeldn and what sup has enabled people to share opinions realtime. Companies and business organizations in the world and Kenya have taken advantage of the platform to advertise, make sales and product reviews. Amazon, e-bay, Google shopping and OLX are examples and the number of reviews especially for popular products grow rapidly. Thus, they make use of people's opinions to make decisions not only for individuals but also for government and commercial sectors. Having such mass volume of data from different information sources make it difficult to take useful and satisfactory decision due to three factors. People cannot read the mass amount of data available, data on the web is unstructured, semistructured and heterogeneous in nature and information about the same product is often spread over a large number of sites and user accounts. Furthermore, differential feature formats and some products using different names make the resulting output of opinion mining and sentiment analysis concerning that domain of the online products. The levels of classifying sentiments include document level, sentence level/phrase level and aspect /feature level. We use it according to the level interest. In our research project we have used feature level since we are collecting opinions about several aspects of the same product and within the same document. We are going to subject the data to the three algorithms naive bayes, support vector machines and maximum entropy. 1.1. Tweeter This is a real time information network that connects individuals to the latest stories, ideas, opinions and news about what you find interesting. To follow conversations and most compelling information, you will simply search their accounts. Bursts of information called tweets will be seen in the tweeter accounts. A tweet has 140 characters long but it gives a lot of information to be discovered. You will find photos, videos, and conversations directly in the tweets to get the whole story at once. In this project we used raw tweeter data collected from several accounts using the tweeter API and preprocessed for the purpose of experimenting.