Publication
Title
Semantic classification of Dutch noun-noun compounds : a distributional semantics approach
Author
Abstract
This article describes the rst attempt to semantically analyse Dutch noun-noun compounds using the distributional hypothesis, which states that the semantics of a word is implicitly represented by the words in its context. The purpose is not only to classify compounds based on their semantics. We also investigate in what circumstances this classication works best. Using O Seaghdha (2008) as a source of inspiration, a list of 1,802 noun-noun compounds was collected and annotated. The annotators had an annotation scheme and guidelines available with six specic semantic categories (BE, HAVE, IN, ACTOR, INST, ABOUT) and ve categories for less specic categories or incor- rect compounds. An inter-annotator agreement of 60.2% was found on a 500 compound subset. The task of automatically analysing compound semantics was framed as a classication task for which we can use supervised machine learning algorithms. The instance vectors were created by concatenating the vectors containing co-occurrence information on the compound constituents. In certain variants of the experiment, principal component analysis (PCA) was used as a means of reducing the dimensionality of the dataset. Support vector machines and instance-based learning were used for the machine learning experiments. A maximum F-score of 49.0% was reached on the normal bag-of-words (BOW) data using the SVM algorithm. The PCA data yielded a maximum F-score of 45.2%. These scores should be compared with a most frequent class baseline of 29.5%. The achieved results in both main variants signicantly outperform this baseline.
Language
English
Source (journal)
Computational linguistics in the Netherlands journal
Publication
2013
Volume/pages
3 (2013) , p. 2-18
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
VABB-SHW
Record
Identifier
Creation 20.02.2014
Last edited 07.10.2022
To cite this reference