Title
Weigh your words : memory-based lemmatization for Middle Dutch Weigh your words : memory-based lemmatization for Middle Dutch
Author
Faculty/Department
Faculty of Arts. Linguistics and Literature
Publication type
article
Publication
Oxford ,
Subject
Computer. Automation
Linguistics
Source (journal)
Literary and linguistic computing. - Oxford, 1986 - 2014
Volume/pages
25(2010) :3 , p. 287-301
ISSN
0268-1145
1477-4615
ISI
000284758900002
Carrier
E
Target language
English (eng)
Full text (Publishers DOI)
Affiliation
University of Antwerp
Abstract
This article deals with the lemmatization of Middle Dutch literature. This text collectionlike any other medieval corpusis characterized by an enormous spelling variation, which makes it difficult to perform a computational analysis of this kind of data. Lemmatization is therefore an essential preprocessing step in many applications, since it allows the abstraction from superficial textual variation, for instance in spelling. The data we will work with is the Corpus-Gysseling, containing all surviving Middle Dutch literary manuscripts dated before 1300 AD. In this article we shall present a language-independent system that can learn intra-lemma spelling variation. We describe a series of experiments with this system, using Memory-Based Machine Learning and propose two solutions for the lemmatization of our data: the first procedure attempts to generate new spelling variants, the second one seeks to implement a novel string distance metric to better detect spelling variants. The latter system attempts to rerank candidates suggested by a classic Levenshtein distance, leading to a substantial gain in lemmatization accuracy. This research result is encouraging and means a substantial step forward in the computational study of Middle Dutch literature. Our techniques might be of interest to other research domains as well because of their language-independent nature.
E-info
https://repository.uantwerpen.be/docman/iruaauth/3fb663/eced7745754.pdf
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000284758900002&DestLinkType=RelatedRecords&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000284758900002&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000284758900002&DestLinkType=CitingArticles&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
Handle