Citation: Vanetik, N.; Mimoun, E. Detection of Racist Language in French Tweets. Information 2022, 13, 318. https://doi.org/10.3390/ info13070318 Academic Editor: Kostas Stefanidis Received: 5 May 2022 Accepted: 27 June 2022 Published: 29 June 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional afﬁl- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). information Article Detection of Racist Language in French Tweets Natalia Vanetik †, * and Elisheva Mimoun † Software Engineering Department, Shamoon College of Engineering, 56 Bialik St. Be’er Sheva 8410802, Israel; elishto@ac.sce.ac.il * Correspondence: natalyav@sce.ac.il † These authors contributed equally to this work. Abstract: Toxic online content has become a major issue in recent years due to the exponential increase in the use of the internet. In France, there has been a signiﬁcant increase in hate speech against migrant and Muslim communities following events such as Great Britain’s exit from the EU, the Charlie Hebdo attacks, and the Bataclan attacks. Therefore, the automated detection of offensive language and racism is in high demand, and it is a serious challenge. Unfortunately, there are fewer datasets annotated for racist speech than for general hate speech available, especially for French. This paper attempts to breach this gap by (1) proposing and evaluating a new dataset intended for automated racist speech detection in French; (2) performing a case study with multiple supervised models and text representations for the task of racist language detection in French; and (3) performing cross-lingual experiments. Keywords: hate speech; racist speech detection; French social media 1. Introduction The exponential growth of social media such as Twitter and community forums has revolutionized communication and content publishing, but it is also increasingly being exploited for the spread of hate speech and the organization of hate activity. The term “hate speech” has been deﬁned as “any communication that denigrates a person or group based on certain characteristics (called types of hate or classes of hate) such as race, color, ethnicity, gender, sexual orientation, nationality, religion or other characteristics” [1]. An ofﬁcial EU deﬁnition of hate speech [2] states that "it is based on the unjustiﬁed assumption that a per- son or a group of persons are superior to others; it incites acts of violence or discrimination, thus undermining respect for minority groups and damaging social cohesion." Hate content on the internet can create fear, anxiety and threats to the safety of individuals. In the case of a business or online platform, the business or platform may lose its reputation or the reputation of its product. Failure to moderate such content can cost the company in multiple ways: loss of users, a drop in stock value, sanctions from legal authorities, etc. A news article [3] and several academic studies [4,5] indicate that during the recent COVID-19 crisis, there was a drastic increase in hate speech against people from China and other Asian countries on Twitter. In many countries, online hate speech is a crime and is punishable by law. In this case, social media are held liable if they do not remove hateful content quickly. However, the anonymity and mobility that these media offer means that the creation and dissemination of hate speech—which can lead to hate crimes—occurs effortlessly in a virtual landscape that eludes traditional law enforcement. Manual analysis of this content and its mod- eration is impossible due to the enormous amount of data circulating on the internet. An effective solution to this problem would be to automatically detect and moderate hate speech comments. In the EU, surveys and reports focusing on young people in the European Economic Area (EEA) region show an increase in hate speech and related crimes based on religious Information 2022, 13, 318. https://doi.org/10.3390/info13070318 https://www.mdpi.com/journal/information