Semantic classification of Dutch noun-noun compounds : a distributional semantics approach

Verhoeven, Ben; Daelemans, Walter

Title

Author

Verhoeven, Ben

Daelemans, Walter

Abstract

This article describes the rst attempt to semantically analyse Dutch noun-noun compounds using the distributional hypothesis, which states that the semantics of a word is implicitly represented by the words in its context. The purpose is not only to classify compounds based on their semantics. We also investigate in what circumstances this classication works best. Using O Seaghdha (2008) as a source of inspiration, a list of 1,802 noun-noun compounds was collected and annotated. The annotators had an annotation scheme and guidelines available with six specic semantic categories (BE, HAVE, IN, ACTOR, INST, ABOUT) and ve categories for less specic categories or incor- rect compounds. An inter-annotator agreement of 60.2% was found on a 500 compound subset. The task of automatically analysing compound semantics was framed as a classication task for which we can use supervised machine learning algorithms. The instance vectors were created by concatenating the vectors containing co-occurrence information on the compound constituents. In certain variants of the experiment, principal component analysis (PCA) was used as a means of reducing the dimensionality of the dataset. Support vector machines and instance-based learning were used for the machine learning experiments. A maximum F-score of 49.0% was reached on the normal bag-of-words (BOW) data using the SVM algorithm. The PCA data yielded a maximum F-score of 45.2%. These scores should be compared with a most frequent class baseline of 29.5%. The achieved results in both main variants signicantly outperform this baseline.

Language

English

Source (journal)

Computational linguistics in the Netherlands journal

Publication

2013

Volume/pages

3 (2013) , p. 2-18

Full text (publisher's version - intranet only)

https://repository.uantwerpen.be/docman/iruaauth/e77a75/00c0b69ae5a.pdf

Faculty/Department				Faculty of Arts. Linguistics

Research group				Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)
Publication type				A1 Journal article

Subject				Linguistics

Affiliation				Publications with a UAntwerp address

VABB-SHW

This title in VABB-SHW

Identifier

Creation

20.02.2014

Last edited

07.10.2022

To cite this reference

https://hdl.handle.net/10067/1141830151162165141