Citation: Vanetik, N.; Mimoun, E.
Detection of Racist Language in
French Tweets. Information 2022, 13,
318. https://doi.org/10.3390/
info13070318
Academic Editor: Kostas Stefanidis
Received: 5 May 2022
Accepted: 27 June 2022
Published: 29 June 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
information
Article
Detection of Racist Language in French Tweets
Natalia Vanetik
†,
* and Elisheva Mimoun
†
Software Engineering Department, Shamoon College of Engineering, 56 Bialik St. Be’er Sheva 8410802, Israel;
elishto@ac.sce.ac.il
* Correspondence: natalyav@sce.ac.il
† These authors contributed equally to this work.
Abstract: Toxic online content has become a major issue in recent years due to the exponential
increase in the use of the internet. In France, there has been a significant increase in hate speech
against migrant and Muslim communities following events such as Great Britain’s exit from the EU,
the Charlie Hebdo attacks, and the Bataclan attacks. Therefore, the automated detection of offensive
language and racism is in high demand, and it is a serious challenge. Unfortunately, there are fewer
datasets annotated for racist speech than for general hate speech available, especially for French.
This paper attempts to breach this gap by (1) proposing and evaluating a new dataset intended for
automated racist speech detection in French; (2) performing a case study with multiple supervised
models and text representations for the task of racist language detection in French; and (3) performing
cross-lingual experiments.
Keywords: hate speech; racist speech detection; French social media
1. Introduction
The exponential growth of social media such as Twitter and community forums has
revolutionized communication and content publishing, but it is also increasingly being
exploited for the spread of hate speech and the organization of hate activity. The term “hate
speech” has been defined as “any communication that denigrates a person or group based
on certain characteristics (called types of hate or classes of hate) such as race, color, ethnicity,
gender, sexual orientation, nationality, religion or other characteristics” [1]. An official EU
definition of hate speech [2] states that "it is based on the unjustified assumption that a per-
son or a group of persons are superior to others; it incites acts of violence or discrimination,
thus undermining respect for minority groups and damaging social cohesion."
Hate content on the internet can create fear, anxiety and threats to the safety of
individuals. In the case of a business or online platform, the business or platform may lose
its reputation or the reputation of its product. Failure to moderate such content can cost
the company in multiple ways: loss of users, a drop in stock value, sanctions from legal
authorities, etc. A news article [3] and several academic studies [4,5] indicate that during
the recent COVID-19 crisis, there was a drastic increase in hate speech against people from
China and other Asian countries on Twitter.
In many countries, online hate speech is a crime and is punishable by law. In this case,
social media are held liable if they do not remove hateful content quickly. However, the
anonymity and mobility that these media offer means that the creation and dissemination
of hate speech—which can lead to hate crimes—occurs effortlessly in a virtual landscape
that eludes traditional law enforcement. Manual analysis of this content and its mod-
eration is impossible due to the enormous amount of data circulating on the internet.
An effective solution to this problem would be to automatically detect and moderate hate
speech comments.
In the EU, surveys and reports focusing on young people in the European Economic
Area (EEA) region show an increase in hate speech and related crimes based on religious
Information 2022, 13, 318. https://doi.org/10.3390/info13070318 https://www.mdpi.com/journal/information