© 2020 JETIR February 2020, Volume 7, Issue 2 www.jetir.org (ISSN-2349-5162)
JETIR2002083 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 558
Detecting Hateful Content on Social Media
Aaniya Gouse
1*,
Afaq Alam Khan
1
1
Department of IT, Central University of Kashmir. India.
Abstract
With the growth of hateful content all over the web, detecting hatred has gained utter importance. To combat the
nuisance self-regulatory methods are found to be in place but mostly these fail to serve the purpose. In this paper we
have addressed the issue by training a supervised classifier that is trained based on semantic features. Just as semantic
features work well with other experiments regarding sentiment analysis, this work is outperforms the state-of-the-art
methods. Observing the performance of the classifier, we incorporate additional features in order that the performance
of the classifier is maximized.
Keywords: Social media, Natural Language Processing, Hate Speech, Semantic Features, Supervised Learning.
Introduction
The term hate speech does not have a clearly defined boundary. It stands practically undefined and its loosely
defined boundaries are often crept over by “free speech”. For instance, calling a person a name could cleanly
be categorized as being “free speech” but at the same time it could be full of hatred innately. Various aspects
of hate are cyber bullying [Hosseinmardi et al (2015)], abuse [Nobata etal (2018)], flaming [Nitin et al
(2012)], toxicity [Jigsaw (2018)] etc. Time and again every social media user has faced hatred online in forms
varying from threat to abuse and so forth. Hateful language has an immense contribution to users abstaining
from using social media altogether. Many platforms implement self-regulatory methods to keep a check on
hateful content being propagated on the social media which include a user purposely reporting a particular
profile as being offensive or violating certain guidelines. These methods being completely dependent over
users’ discretion and their own definitions of hate are under qualified to be banked upon. Therefore, as
communication online grows a need for an automated hate detector is ever increasing. Prospectively, our hate
speech classifier shall prevent all hate crime while still arising. The goal of this research is to combat
genocide, suicide, cyber bullying, trolling, terrorist propaganda etc. Our challenge is to detect hate out of the
ulterior faces put on by it in the form of sarcasm, offense or misspellings. The state-of-the-art methods mainly
employ statistical features but these do not lead to high accuracy and are error-prone as well. As semantic
features in other classification tasks are observed to perform better we decided to make use of the same,
making ours a one-of-a-kind hate speech classifier. In addition to semantic features, we shall incorporate
lexical, morphological and contextual in order that the classifier turns out to be more intelligent.
Contribution of Author
The research is about designing a multi-class general-purpose classifier that is capable of detecting hateful
content on social media. Specifically, our contribution can be stated as follows:
An efficient method for pre-processing of text which includes correcting misspellings and eradicating
slangs.
A classifier that is capable of classifying hatred out of a given corpus.
Linguistic analysis of text to reveal syntactic and semantic details of language.
Evaluation of classifier on datasets of varying sizes and types in order to observe variations in
performances.