Title
|
|
|
|
A Deep Generative Approach to Native Language Identification
|
|
Author
|
|
|
|
|
|
Abstract
|
|
|
|
Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets. |
|
|
Language
|
|
|
|
English
|
|
Source (book)
|
|
|
|
Proceedings of the 28th International Conference on Computational Linguistics
|
|
Publication
|
|
|
|
International Committee on Computational Linguistics
,
2020
|
|
ISBN
|
|
|
|
978-1-952148-28-6
|
|
DOI
|
|
|
|
10.18653/V1/2020.COLING-MAIN.159
|
|
Volume/pages
|
|
|
|
p. 1778-1783
|
|
Full text (Publisher's DOI)
|
|
|
|
|
|