Improving Dutch vaccine hesitancy monitoring via multi-label data augmentation with GPT-3.5

Van Nooten, Jens; Daelemans, Walter

doi:10.18653/V1/2023.WASSA-1.23

Title

Improving Dutch vaccine hesitancy monitoring via multi-label data augmentation with GPT-3.5

Author

Van Nooten, Jens

Daelemans, Walter

Abstract

In this paper, we leverage the GPT-3.5 language model both using the Chat-GPT API interface and the GPT-3.5 API interface to generate realistic examples of anti-vaccination tweets in Dutch with the aim of augmenting an imbalanced multi-label vaccine hesitancy argumentation classification dataset. In line with previous research, we devise a prompt that, on the one hand, instructs the model to generate realistic examples based on the human dataset (gold standard) and, on the other hand, to assign one or multiple labels to the generated instances. We then augment our gold standard data with the generated examples and evaluate the impact thereof in a cross-validation setting with several state-of-the-art Dutch BERT models. This augmentation technique predominantly shows improvements in F1 for classifying underrepresented classes while increasing the overall recall, paired with a slight decrease in precision for more common classes. Furthermore, we examine how well the synthetic data generalises to human data in the classification task. To our knowledge, we are the first to utilise Chat-GPT and GPT-3.5 for augmenting a Dutch multilabel dataset classification task.

Language

English

Source (book)

Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, July 2023; Toronto, Canada

Publication

Toronto : Association for Computational Linguistics , 2023

ISBN

978-1-959429-87-6

DOI

10.18653/V1/2023.WASSA-1.23

Volume/pages

1 (2023) , p. 251-270

Full text (Publisher's DOI)

https://doi.org/10.18653/V1/2023.WASSA-1.23

Full text (open access)

Licensed under a CC BY Attribution license

Faculty/Department				Faculty of Arts. Linguistics

Research group				Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Publication type				P3 Proceeding

Subject				Linguistics

Affiliation				Publications with a UAntwerp address

Identifier

Creation

20.02.2024

Last edited

17.06.2024

To cite this reference

https://hdl.handle.net/10067/2032100151162165141