Publication
Title
A Deep Generative Approach to Native Language Identification
Author
Abstract
Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.
Language
English
Source (book)
Proceedings of the 28th International Conference on Computational Linguistics
Publication
International Committee on Computational Linguistics , 2020
ISBN
978-1-952148-28-6
DOI
10.18653/V1/2020.COLING-MAIN.159
Volume/pages
p. 1778-1783
Full text (Publisher's DOI)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
VABB-SHW
Record
Identifier
Creation 22.01.2021
Last edited 22.08.2024
To cite this reference