IITB-Sentiment-Analysts: Participation to Task #2 Sentiment Analysis in Twitter Karan Chawla, Ankit Ramteke, Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay {chawlakaran,ankitr,pb}@cse.iitb.ac.in Abstract We propose a method for using discourse relations for polarity detection of tweets. We have focused on unstructured and noisy text like tweets on which linguistic tools like parsers and POS-taggers don’t work properly. We have showed how con- junctions, connectives, modals and conditionals af- fect the sentiments in tweets. We have also han- dled the commonly used abbreviations, slangs and collocations which are usually used in short text messages like tweets. This work focuses on a Web based application which produces results in real time. This approach is an extension of the previous work (Mukherjee et al. 2012). 1. Introduction Discourse relation is an important component of nat- ural language processing which connects phrases and clauses together to establish a coherent relation. Lin- guistic constructs like conjunctions, connectives, modals, conditionals and negation do alter the senti- ments of a sentence. For example, the movie had quite a few memorable moments but I still did not like it. The overall polarity of the sentence is nega- tive even though it has one positive and one negative clause. This is because of the presence of the con- junction but which gives more weightage to the clause following the conjunction. Traditional works in discourse analysis use a dis- course parser (Marcu et al., 2003; Polanyi et al., 2004; Wolf et al., 2005; Welner et al., 2006; Naraya- nan et al., 2009; Prasad et al., 2010). Many of these works and some other works in discourse (Taboada et al., 2008; Zhou et al., 2011) build on the Rhetori- cal Structure Theory (RTS) proposed by Mann et al. (1988) which tries to identify the relations between the nucleus and satellite in the sentence. Most of the work is based on well-structured text and the methods applied on that text is not suitable for the discourse analysis on micro-blogs because of the following reasons: 1. Micro-blogs like Twitter restricts a post (tweet) to be of only 140 characters. Thus, users do not use formal language to discuss their views. Thus, there are abundant spelling mistakes, abbreviations, slangs, collocations, discontinuities and grammatical errors. These differences cause NLP tools like POS-taggers and parsers to fail frequently, as these tools are built for well-structured text. Thus, most of the methods described in the previous works are not well suited for discourse analysis on Micro-blogs like text. 2. The web-based applications require a fast response time. Using a heavy linguistic resource like parsing increases the processing time and slows down the application. Most of the previous work on discourse analysis does not take into consideration the conjunctions, connectives, modals, conditionals etc and are based on bag-of-words model with features like part-of- speech information, unigrams, bigrams etc. along with other domain-specific features like emoticons, hashtags etc. Our work harness the importance of discourse connectives like conjunctions, connectives, modals, conditionals etc and show that along with bag-of-words model, it gives better sentiment classi- fication accuracy. This work is the extension of (Mukherjee et al. 2012). The roadmap for the rest of the paper is as follows: Section 2 studies the effect of discourse relations on sentiment analysis and identifies the critical ones. Section 3 talks about the semantic operators which influence the discourse relations. Section 4 discusses the lexicon based classification approach. Section 5 describes the feature engineering of the important features. Section 6 gives the list of experiments con- ducted and analysis of the results. Conclusion and Future Work is presented in Section 7.