International Journal of Scientific and Research Publications, Volume 8, Issue 3, March 2018 99 ISSN 2250-3153 http://dx.doi.org/10.29322/IJSRP.8.3.2018.p7517 www.ijsrp.org Using Naïve Bayes Algorithm in detection of Hate Tweets. Kelvin Kiema Kiilu, George Okeyo, Richard Rimiru, Kennedy Ogada Department of Computing, Jomo Kenyatta University of Agriculture and Technology DOI: 10.29322/IJSRP.8.3.2018.p7517 http://dx.doi.org/10.29322/IJSRP.8.3.2018.p7517 Abstract- Social Media has become a very powerful tool for information exchange as it allows users to not only consume information but also share and discuss various aspects of their interest. Nevertheless, online social platforms are beset with hateful speech - content that expresses hatred for a person or group of people. Such content can frighten, intimidate, or silence platform users, and some of it can incite other users to commit violence. Furthermore, social media gives users the freedom to express their thoughts in text without following traditional language grammars, thereby making it difficult to mine social media for insights. Despite widespread recognition of the problems posed by social media content, reliable solutions even for detecting hateful speech are lacking. The main goal of this study is to develop a reliable tool for detection of hate tweets. This paper develops an approach for detecting and classifying hateful speech that uses content produced by self-identifying hateful communities from Twitter. Results from experiments showed Naive Bayes classifier achieved significantly better performance than existing methods in hate speech detection algorithms with precision, recall, and accuracy values of 58% ,62%,and67.47%,respectively. Index Terms- Hate tweets, Naive Bayes, Text Classification, Sentiment analysis. I. INTRODUCTION n recent years, Twitter has become one of the most popular micro-blogging social-media platforms, providing a platform for millions of people to share their daily opinions/thoughts using real-time status updates Conover et al. (2013). Twitter has 270 Million active users and 500 million tweets are sent per day. M.C. Wellons, (2015). Due to high reachability and popularity of social media websites worldwide, organizations also use these websites for planning and mobilizing events for protests and public demonstrations Muthiah et al. (2015). Twitter is a famous platform for opinion and information sharing and this platform is mostly used before, during and after live events Bollen et al. (2011). Online spaces are often exploited and misused to spread content that can be degrading, abusive, or otherwise harmful to people. Twitter prohibits users to post violent threats, harassment, and hateful contents. However, there are still tons of users who disobey the rules and use their Twitter account to spread hate speech and negative words. An important and elusive form of such language is hateful speech: content that expresses hatred of a group in society. Hateful speech has become a major problem for every kind of online platform where user-generated content appears: from the comment sections of news websites to real-time chat sessions in immersive games. Such content can alienate users and can also support radicalization and incite violence Allan, (2013). It is through such access to Twitter where various users have used the platform to propagate and promote hatred tweets to various target groups and individuals Wilkinson, (1997). No formal definition of hate speech exists but there is a consensus that it is speech that targets disadvantaged social groups in a manner that is potentially harmful to them Jacobs & Potter 2000; Walker 1994). In Kenya, hate speech has been defined as any form of speech that degrades others and promotes hatred and encourages violence against a group on the basis of a criteria including religion, race, color or ethnicity. It includes speech, publication or broadcast that represents as inherently inferior, or degrades, dehumanizes and demeans a group. (KHRC, 2010). Importantly, the definition does not include all instances of offensive language because people often use terms that are highly offensive to certain groups but in a qualitatively different manner. Anything tweeted can reach a huge number and the effects can be extensively great. We were concerned with the task of detecting; identifying and analyzing the spread of hate speech sentiments in the social site and specifically twitter in Kenya. Sentiment analysis is an area of natural language processing which aims at determination of opinions, attitudes of a writer in the text or their attitude towards specific topics. Sentiment describes an opinion or attitude expressed by an individual, the opinion holder, about an entity, the target. The research field of sentiment analysis has developed algorithms to automatically detect sentiment in text Pang & Lee, (2008). Whilst some identify the objects discussed and the polarity (positive, negative or neutral) of sentiment expressed about them Gamon et al. (2005), other algorithms assign an overall polarity to a text, such as a movie review Pang & Lee, (2004). Three common sentiment analysis approaches are full-text machine learning, lexicon-based methods and linguistic analysis. For standard machine learning e.g., Witten & Frank, (2005), a set of texts annotated for polarity by human coders are used to train an algorithm to detect features that associate with positive, negative and neutral categories. The text features used are typically sets of all words, word pairs and word triples found in the texts. The lexicon approach starts with lists of words that are pre-coded for polarity and sometimes also for strength .It uses their occurrence within texts to predict their polarity. A linguistic analysis, in contrast, exploits the grammatical structure of text to predict its polarity, often in conjunction with a lexicon. I