Title
|
|
|
|
Data-Driven Syllabification for Middle Dutch
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort. |
| |
Language
|
|
|
|
English
| |
Source (journal)
|
|
|
|
Digital medievalist / University of Lethbridge. - Lethbridge, Alta, 2005, currens
| |
Related dataset(s)
|
|
|
|
| |
Publication
|
|
|
|
Lethbridge, Alta
:
University of Lethbridge
,
2019
| |
ISSN
|
|
|
|
1715-0736
| |
DOI
|
|
|
|
10.16995/DM.83
| |
Volume/pages
|
|
|
|
12
:1,2
(2019)
, p. 1-23
| |
Full text (Publisher's DOI)
|
|
|
|
| |
Full text (open access)
|
|
|
|
| |
|