Title
|
|
|
|
Assessing and mitigating bias in natural language systems
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
As natural language-based technologies continue to develop and play a prominent role in society, increasing attention is being paid to the ethical issues constraining their use, with bias being a prominent concern. There is a growing body of evidence highlighting biases, such as gender bias, within natural language models. Although considerable work has been done to understand and address this issue, significant challenges remain regarding how to detect, measure, and effectively mitigate bias. This thesis addresses two main themes. The first part primarily involves empirical investigations of various existing techniques and approaches for detecting, measuring, and mitigating bias in natural language processing (NLP). The second part focuses on developing solutions to mitigate bias in language-based technologies and human-generated biases. We first investigate existing techniques to measure bias in natural language models. Specifically, we review the literature on fairness metrics for pre-trained language models and empirically evaluate their consistency and compatibility. We investigate how various factors, such as templates, attribute and target seeds, and the choice of embeddings used by existing techniques, affect how bias is quantified. Secondly, we investigate the relationship between bias in pretrained language models and fine-tuned language models for downstream applications. We design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover the propensity for some intrinsic bias mitigation techniques to hide bias instead of resolving it and show inconsistencies in how bias measuring techniques measure bias with respect to certain mitigation techniques. We also find that bias inherent in a pretrained model has little material effect on downstream fairness. Thirdly, we develop an automated approach to generating parallel data for training counterfactual text generator models for counterfactual data augmentation (CDA) that limits the need for human intervention. Although CDA has been a widely used mitigation strategy in NLP, existing works have significant issues, which we also highlight in this thesis. Finally, we propose a text style transfer technique to automatically mitigate bias in textual data. Our text-style transfer model can be trained on non-parallel data. We demonstrate that our approach overcomes the limitations of many existing text style transfer techniques. |
| |
Language
|
|
|
|
English
| |
Publication
|
|
|
|
Antwerpen
:
University of Antwerp, Faculty of Science
,
2024
| |
DOI
|
|
|
|
10.63028/10067/2090950151162165141
| |
Volume/pages
|
|
|
|
xvi, 136 p.
| |
Note
|
|
|
|
:
Calders, Toon [Supervisor]
| |
Full text (open access)
|
|
|
|
| |
|