Never Too Late to Learn: Regularizing Gender Bias in
Coreference Resolution
SunYoung Park
sunyoung.p.skku@gmail.com
Sungkyunkwan University
Suwon-si, Gyeonggi-do, Republic of Korea
Kyuri Choi
gguriskku@gmail.com
Sungkyunkwan University
Suwon-si, Gyeonggi-do, Republic of Korea
Haeun Yu
haeun.yu204@gmail.com
Sungkyunkwan University
Suwon-si, Gyeonggi-do, Republic of Korea
Youngjoong Ko
∗
youngjoong.ko@gmail.com/yjko@skku.edu
Sungkyunkwan University
Suwon-si, Gyeonggi-do, Republic of Korea
ABSTRACT
Leveraging pre-trained language models (PLMs) as initializers for
efcient transfer learning has become a universal approach for
text-related tasks. However, the models not only learn the language
understanding abilities but also reproduce prejudices for certain
groups in the datasets used for pre-training. Recent studies show
that the biased knowledge acquired from the datasets afects the
model predictions on downstream tasks. In this paper, we mitigate
and analyze the gender biases in PLMs with coreference resolu-
tion, which is one of the natural language understanding (NLU)
tasks. PLMs exhibit two types of gender biases: stereotype and skew.
The primary causes for the biases are the imbalanced datasets with
more male examples and the stereotypical examples on gender roles.
While previous studies mainly focused on the skew problem, we
aim to mitigate both gender biases in PLMs while maintaining the
model’s original linguistic capabilities. Our method employs two
regularization terms, Stereotype Neutralization (SN) and Elastic
Weight Consolidation (EWC). The models trained with the methods
show to be neutralized and reduce the biases signifcantly on the
WinoBias dataset compared to the public BERT. We also invented a
new gender bias quantifcation metric called the Stereotype Quan-
tifcation (SQ) score. In addition to the metrics, embedding visual-
izations were used to interpret how our methods have successfully
debiased the models.
CCS CONCEPTS
· Computing methodologies → Natural language processing.
KEYWORDS
Gender bias; Coreference resolution; Model debiasing; Bias mea-
surement; Ethical AI
∗
Corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
WSDM ’23, February 27-March 3, 2023, Singapore, Singapore
© 2023 Association for Computing Machinery.
ACM ISBN 978-1-4503-9407-9/23/02. . . $15.00
https://doi.org/10.1145/3539597.3570473
ACM Reference Format:
SunYoung Park, Kyuri Choi, Haeun Yu, and Youngjoong Ko. 2023. Never
Too Late to Learn: Regularizing Gender Bias in Coreference Resolution. In
Proceedings of the Sixteenth ACM International Conference on Web Search and
Data Mining (WSDM ’23), February 27-March 3, 2023, Singapore, Singapore.
ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3539597.3570473
1 INTRODUCTION
Natural language understanding (NLU) refers to computer’s un-
derstanding of human language and is the basis of all text-related
studies. As a major framework for NLU, Transformer [29]-based
pre-trained language models (PLMs), such as BERT [6] or RoBERTa
[18], have gained popularity among many AI researchers. The ad-
vantage of using PLMs is that the models can be good initializers
for efcient transfer learning on downstream tasks. However, as
massive amount of text data are used to train PLMs, the models also
inherit societal biases in the datasets without any constraints. They
not only learn how to efectively observe the linguistic features
and contextual information but also learn to discriminate certain
groups, replicating the stereotypes from the imbalanced datasets.
Recent studies suggested that the biased knowledge acquired
from the datasets afects the model predictions on downstream
tasks, such as ranking, dialogue systems [9, 17], language classifca-
tion [1, 14, 20], and machine translation [7, 28]. For instance, neural
ranking models based on PLMs tend to have more gender biases
compared to other types of rankers, showing that fne-tuned PLMs
have a higher probability of reproducing societal biases [24]. More-
over, Sundararaman and Subramanian [26] pointed out that PLM-
based rankers have preferences towards male-version documents
over female-version except for when the model was fne-tuned on
the ‘Child Care’ domain dataset. These results imply the existence
of data bias and drawbacks of transfer learning, and also mark the
importance of mitigating and analyzing biases in PLMs.
Among various societal biases, this paper focuses on measuring
and alleviating gender bias in natural language understanding.
1
Coreference resolution, one of the NLU tasks, is about linking the
referent and the pronoun based on a comprehensive understanding
of the given text. The task is often employed in various down-
stream tasks, such as abstractive summarization [21], to enhance
the model’s general NLU abilities and improve model performances.
1
For practical reasons, gender is restricted to the binary concepts (male and female)
in this work. This follows the gender taxonomy, distinguishing one another by its
biological characteristics.
15