Benchmarking zero-shot text classification for Dutch

De Langhe, Loic; Maladry, Aaron; Vanroy, Bram; De Bruyne, Luna; Singh, Pranaydeep; Lefever, Els; De Clercq, Orphée

Title

Author

De Langhe, Loic

Maladry, Aaron

Vanroy, Bram

De Bruyne, Luna

Singh, Pranaydeep

Lefever, Els

De Clercq, Orphée

Abstract

The advent and popularisation of Large Language Models (LLMs) have given rise to promptbased Natural Language Processing (NLP) techniques which eliminate the need for large manually annotated corpora and computationally expensive supervised training or fine-tuning processes. Zero-shot learning in particular presents itself as an attractive alternative to the classical train-development-test paradigm for many downstream tasks as it provides a quick and inexpensive way of directly leveraging the implicitly encoded knowledge in LLMs. Despite the large interest in zero-shot applications within the domain of NLP as a whole, there is often no consensus on the methodology, analysis and evaluation of zero-shot pipelines. As a tentative step towards finding such a consensus, this work provides a detailed overview of available methods, resources, and caveats for zero-shot prompting within the Dutch language domain. At the same time, we present centralised zero-shot benchmark results on a large variety of Dutch NLP tasks using a series of standardised datasets. These tasks vary in subjectivity and domain, ranging from more social information extraction tasks (sentiment, emotion and irony detection for social media) to factual tasks (news topic classification and event coreference resolution). To ensure that the benchmark results are representative, we investigated a selection of zero-shot methodologies for a variety of state-of-the-art Dutch Natural Language Inference models (NLI), Masked Language models (MLM), and autoregressive language models. The output on each test set was compared to the best performance achieved using supervised methods. Our findings indicate that task-specific fine-tuning delivers superior performance in all but one (emotion detection) task. In the zero-shot settings it could be observed that large generative models through prompting seem to outperform NLI models, which in turn perform better than the MLM approach. Finally, we note several caveats and challenges tied to using zero-shot learning in application settings. These include, but are not limited to, properly streamlining evaluation of zero-shot output, parameter efficiency compared to standard finetuned models and prompt optimization.

Language

English

Source (journal)

Computational Linguistics in the Netherlands Journal

Publication

2024

Volume/pages

13 (2024) , p. 63-90

Medium

E-only publicatie

Full text (open access)

https://repository.uantwerpen.be/docstore/d:irua:23151

Faculty/Department				Faculty of Arts. Linguistics

Research group				Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Publication type				A1 Journal article

Subject				Computer. Automation Linguistics

Affiliation				Publications with a UAntwerp address

Source file

https://www.clinjournal.org/clinj/article/view/172

Identifier

Creation

29.04.2024

Last edited

09.10.2024

To cite this reference

https://hdl.handle.net/10067/2053620151162165141