International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2276 Hybrid Classifier for Sentiment Analysis Using Effective Pipelining Akhil Sharma [1] , Aman Sharma [2] , Rajeev Kumar Singh [3] , Dr. Madhur Deo Upadhayay [4] 1,2 Electronics and Communication Engineering, Shiv Nadar University, Uttar Pradesh, India 3 Assistant Professor, Dept. of Electrical Engineering, Shiv Nadar University, Uttar Pradesh, India 4 Assistant Professor, Dept. of Computer Science, Shiv Nadar University, Uttar Pradesh, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - A Social media has become a platform for people to express their thoughts, opinions and ideas. Facebook, Twitter, Google+ and the likes have emerged as data hubs for people wanting to improve market sales, predict outcomes of events, and characteristics of human behavior. Polling and surveys are outdated and lengthy techniques. With opinion mining and sentiment analysis data extraction and classification becomes easy. In this paper, we have used a hybrid method for analyzing sentiments. This method employs a pipeline system consisting of rules, lexicon and machine learning based classifier where a tweet after undergoing preprocessing is first classified by the lexicon and the rules classifier and is sent to the machine learning module only if the tweet’s analysis score doesn't achieve a predetermined threshold value. A comparison is made between the individual - rules, lexicon, and machine learning approaches, and hybrid classifier on the basis of F-score, recall and precision. Key Words: opinion mining, sentiment analysis, rules- based, lexicon-based, classifier, hybrid approach. 1. INTRODUCTION Twitter is one of the most popular microblogging and social networking websites [4]. People from time to time post on Twitter, an activity called tweeting. The diversity of people on twitter makes the tweets more versatile and valuable [7]. Therefore, Twitter becomes one of the most valuable places to find opinions on any issue. This allows computer scientists to perform credible sentiment analysis and develop pathways for data mining. This data can be used in marketing, sales or poll analysis. Timely feedback on products can be collected by evaluating peopleǯs tweets on Twitter [1,2,3]. Researchers can use the data sets to build unsolicited public opinion polls on important social matters [1]. Social media becomes a powerful tool for common public to get involved with politics, media and business intrinsically. Polls are expensive and time consuming [1,2]. With continued improvement in data analysis techniques, these tasks have become practically viable. The credibility of data and results is higher than before. Manual surveys and polls are not always trustable, whereas there is significantly less or negligible scope for human errors in data mining and subsequent analysis. Political inclinations, interests of common public will be available for parties to understand and prepare for their campaigns. The needs of people and complains from the society will become accessible to politicians. The gap between the government and public can be bridged with ease. Predictions pertaining to elections or major events can also be extracted in one go [1]. After any incident, protest or social unrest, people log into social media websites to post or to make a comment in order to express their thoughts and opinions. Social media is powerful in terms of spreading social awareness about crimes, diseases, and other epidemics. Twitter has become a solid and trustable commodity not only for its users but also researchers. The data consolidated can give great pictorial trends regarding peopleǯs opinions. The unprecedented view of public is displayed on social media, especially on Twitter [1]. Sentiment analysis is a field of study to find how sentiments and opinions are expressed in texts. Approaches that are used to classify sentiments include - rules based, lexicon based, machine learning and using deep learning techniques [2,3,10,11]. The method of classifying tweets on the basis of pre-fixed rules is called rules based approach. The approach of using opinion words or the lexicon to determine opinion orientations is called lexicon based approach [1,5]. Rules based approach along with lexicon based approach has high precision but low recall [2]. Emoticons, informal language and abbreviations are some of the parts of unstructured textual data that may go undetected or unclassified in the lexicon based approach. For example ǲMauritius is a grͺ holiday destination,ǳ is a sentence of positive demeanor. However, a classifier using lexicon based approach might classify it as neutral or no. Although, it is possible to add these expressions in the opinion lexicon, due to continuous change in their usage, it becomes hard to classify [2]. Another method that is used for sentiment analysis is the machine learning approach [4,8]. This method is effective for classification of sentences and documents by training the classifier to determine positive, negative and neutral sentiments [4,8]. Since manual labelling of large set of tweets is often time consuming and difficult, this approach is not easy to implement [2]. Also, Deep Learning algorithms could provide the most accurate results, but these techniques are extremely computationally expensive to train. To optimize the large amount of matrix multiplication operations that deep learning involves, substantial investment is needed to upgrade the IT infrastructure for more processing power. Moreover, deep learning requires immense amount of data to train the model as compared to traditional machine