Publication
Title
Benchmarking zero-shot text classification for Dutch
Author
Abstract
The advent and popularisation of Large Language Models (LLMs) have given rise to promptbased Natural Language Processing (NLP) techniques which eliminate the need for large manually annotated corpora and computationally expensive supervised training or fine-tuning processes. Zero-shot learning in particular presents itself as an attractive alternative to the classical train-development-test paradigm for many downstream tasks as it provides a quick and inexpensive way of directly leveraging the implicitly encoded knowledge in LLMs. Despite the large interest in zero-shot applications within the domain of NLP as a whole, there is often no consensus on the methodology, analysis and evaluation of zero-shot pipelines. As a tentative step towards finding such a consensus, this work provides a detailed overview of available methods, resources, and caveats for zero-shot prompting within the Dutch language domain. At the same time, we present centralised zero-shot benchmark results on a large variety of Dutch NLP tasks using a series of standardised datasets. These tasks vary in subjectivity and domain, ranging from more social information extraction tasks (sentiment, emotion and irony detection for social media) to factual tasks (news topic classification and event coreference resolution). To ensure that the benchmark results are representative, we investigated a selection of zero-shot methodologies for a variety of state-of-the-art Dutch Natural Language Inference models (NLI), Masked Language models (MLM), and autoregressive language models. The output on each test set was compared to the best performance achieved using supervised methods. Our findings indicate that task-specific fine-tuning delivers superior performance in all but one (emotion detection) task. In the zero-shot settings it could be observed that large generative models through prompting seem to outperform NLI models, which in turn perform better than the MLM approach. Finally, we note several caveats and challenges tied to using zero-shot learning in application settings. These include, but are not limited to, properly streamlining evaluation of zero-shot output, parameter efficiency compared to standard finetuned models and prompt optimization.
Language
English
Source (journal)
Computational Linguistics in the Netherlands Journal
Publication
2024
Volume/pages
13 (2024) , p. 63-90
Medium
E-only publicatie
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Source file
Record
Identifier
Creation 29.04.2024
Last edited 09.10.2024
To cite this reference