Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution SunYoung Park sunyoung.p.skku@gmail.com Sungkyunkwan University Suwon-si, Gyeonggi-do, Republic of Korea Kyuri Choi gguriskku@gmail.com Sungkyunkwan University Suwon-si, Gyeonggi-do, Republic of Korea Haeun Yu haeun.yu204@gmail.com Sungkyunkwan University Suwon-si, Gyeonggi-do, Republic of Korea Youngjoong Ko ∗ youngjoong.ko@gmail.com/yjko@skku.edu Sungkyunkwan University Suwon-si, Gyeonggi-do, Republic of Korea ABSTRACT Leveraging pre-trained language models (PLMs) as initializers for efcient transfer learning has become a universal approach for text-related tasks. However, the models not only learn the language understanding abilities but also reproduce prejudices for certain groups in the datasets used for pre-training. Recent studies show that the biased knowledge acquired from the datasets afects the model predictions on downstream tasks. In this paper, we mitigate and analyze the gender biases in PLMs with coreference resolu- tion, which is one of the natural language understanding (NLU) tasks. PLMs exhibit two types of gender biases: stereotype and skew. The primary causes for the biases are the imbalanced datasets with more male examples and the stereotypical examples on gender roles. While previous studies mainly focused on the skew problem, we aim to mitigate both gender biases in PLMs while maintaining the model’s original linguistic capabilities. Our method employs two regularization terms, Stereotype Neutralization (SN) and Elastic Weight Consolidation (EWC). The models trained with the methods show to be neutralized and reduce the biases signifcantly on the WinoBias dataset compared to the public BERT. We also invented a new gender bias quantifcation metric called the Stereotype Quan- tifcation (SQ) score. In addition to the metrics, embedding visual- izations were used to interpret how our methods have successfully debiased the models. CCS CONCEPTS · Computing methodologies → Natural language processing. KEYWORDS Gender bias; Coreference resolution; Model debiasing; Bias mea- surement; Ethical AI ∗ Corresponding author Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. WSDM ’23, February 27-March 3, 2023, Singapore, Singapore © 2023 Association for Computing Machinery. ACM ISBN 978-1-4503-9407-9/23/02. . . $15.00 https://doi.org/10.1145/3539597.3570473 ACM Reference Format: SunYoung Park, Kyuri Choi, Haeun Yu, and Youngjoong Ko. 2023. Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM ’23), February 27-March 3, 2023, Singapore, Singapore. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3539597.3570473 1 INTRODUCTION Natural language understanding (NLU) refers to computer’s un- derstanding of human language and is the basis of all text-related studies. As a major framework for NLU, Transformer [29]-based pre-trained language models (PLMs), such as BERT [6] or RoBERTa [18], have gained popularity among many AI researchers. The ad- vantage of using PLMs is that the models can be good initializers for efcient transfer learning on downstream tasks. However, as massive amount of text data are used to train PLMs, the models also inherit societal biases in the datasets without any constraints. They not only learn how to efectively observe the linguistic features and contextual information but also learn to discriminate certain groups, replicating the stereotypes from the imbalanced datasets. Recent studies suggested that the biased knowledge acquired from the datasets afects the model predictions on downstream tasks, such as ranking, dialogue systems [9, 17], language classifca- tion [1, 14, 20], and machine translation [7, 28]. For instance, neural ranking models based on PLMs tend to have more gender biases compared to other types of rankers, showing that fne-tuned PLMs have a higher probability of reproducing societal biases [24]. More- over, Sundararaman and Subramanian [26] pointed out that PLM- based rankers have preferences towards male-version documents over female-version except for when the model was fne-tuned on the ‘Child Care’ domain dataset. These results imply the existence of data bias and drawbacks of transfer learning, and also mark the importance of mitigating and analyzing biases in PLMs. Among various societal biases, this paper focuses on measuring and alleviating gender bias in natural language understanding. 1 Coreference resolution, one of the NLU tasks, is about linking the referent and the pronoun based on a comprehensive understanding of the given text. The task is often employed in various down- stream tasks, such as abstractive summarization [21], to enhance the model’s general NLU abilities and improve model performances. 1 For practical reasons, gender is restricted to the binary concepts (male and female) in this work. This follows the gender taxonomy, distinguishing one another by its biological characteristics. 15