Ruddit: Norms of Offensiveness for English Reddit Comments Rishav Hada 1, * , Sohi Sudhir 1,* , Pushkar Mishra 2 , Helen Yannakoudakis 3 , Saif M. Mohammad 4 , Ekaterina Shutova 1 1 ILLC, University of Amsterdam 2 Facebook AI, London 3 Dept. of Informatics, King’s College London 4 National Research Council Canada rishavhada@gmail.com, sohigre@gmail.com, pushkarmishra@fb.com, helen.yannakoudakis@kcl.ac.uk, saif.mohammad@nrc-cnrc.gc.ca, e.shutova@uva.nl Abstract Warning: This paper contains comments that may be offensive or upsetting. On social media platforms, hateful and of- fensive language negatively impact the men- tal well-being of users and the participation of people from diverse backgrounds. Auto- matic methods to detect offensive language have largely relied on datasets with categorical labels. However, comments can vary in their degree of offensiveness. We create the first dataset of English language Reddit comments that has fine-grained, real-valued scores be- tween -1 (maximally supportive) and 1 (max- imally offensive). The dataset was annotated using Best–Worst Scaling, a form of compara- tive annotation that has been shown to allevi- ate known biases of using rating scales. We show that the method produces highly reliable offensiveness scores. Finally, we evaluate the ability of widely-used neural models to predict offensiveness scores on this new dataset. 1 Introduction Social media platforms serve as a medium for ex- change of ideas on a range of topics, from the per- sonal to the political. This exchange can, however, be disrupted by offensive or hateful language. Such language is pervasive online (Statista, 2020b), and exposure to it may have numerous negative con- sequences for the victim’s mental health (Munro, 2011). Automated offensive language detection has thus been gaining interest in the NLP community, as a promising direction to better understand the nature and spread of such content. There are several challenges in the automatic detection of offensive language (Wiedemann et al., 2018). The NLP community has adopted various definitions for offensive language, classifying it into specific categories. For example, Waseem and * Both authors contributed equally. Hovy (2016) classified comments as racist, sex- ist, neither; Davidson et al. (2017) as hate-speech, offensive but not hate-speech, neither offensive nor hate-speech and Founta et al. (2018) as abu- sive, hateful, normal, spam. Schmidt and Wiegand (2017); Fortuna and Nunes (2018); Mishra et al. (2019); Kiritchenko and Nejadgholi (2020) summa- rize the different definitions. However, these cat- egories have significant overlaps with each other, creating ill-defined boundaries, thus introducing ambiguity and annotation inconsistency (Founta et al., 2018). A further challenge is that after en- countering several highly offensive comments, an annotator might find subsequent moderately offen- sive comments to not be offensive (de-sensitization) (Kurrek et al., 2020; Soral et al., 2018). At the same time, existing approaches do not take into account that comments can be offensive to a different degree. Knowing the degree of offen- siveness of a comment has practical implications, when taking action against inappropriate behaviour online, as it allows for a more fine-grained analysis and prioritization in moderation. The representation of the offensive class in a dataset is often boosted using different strategies. The most common strategy used is key-word based sampling. This results in datasets that are rich in explicit offensive language (language that is un- ambiguous in its potential to be offensive, such as those using slurs or swear words (Waseem et al., 2017)) but lack cases of implicit offensive lan- guage (language with its true offensive nature ob- scured due to lack of unambiguous swear words, usage of sarcasm or offensive analogies, and oth- ers (Waseem et al., 2017; Wiegand et al., 2021)) (Waseem, 2016; Wiegand et al., 2019). key-word based sampling often results in spurious correla- tions (e.g., sports-related expressions such as an- nouncer and sport occur very frequently in offen- sive tweets). Lastly, existing datasets consider of- arXiv:2106.05664v2 [cs.CL] 11 Jun 2021