Publication
Title
The automated detection of racist discourse in Dutch social media
Author
Abstract
We present two experiments on the automated detection of racist discourse in Dutch social media. In both experiments, multiple classiers are trained on the same training set. This training set consists of Dutch posts retrieved from two public Belgian social media pages which are likely to attract racist reactions. The posts were labeled as racist or non-racist by multiple annotators, who reached an acceptable agreement score. The dierent classication models all use the Support Vector Machine algorithm, but use dierent (sets of) linguistic features, which can be lexical, stylistic or dictionary-based. In the rst experiment, the models are evaluated on a test set containing unseen comments retrieved from the same pages as the training set (and thus also skewed towards racism). In the second experiment, the same models from Experiment 1 are tested on an alternative test set, containing more neutral comments, retrieved from the social media page of a Belgian newspaper. In both experiments, the best performing model relies on a dictionary containing dierent word categories specically related to racist discourse. It reaches an F-score of 0.47 (exp. 1) and 0.40 (exp. 2) for the racist class and ROC Area Under Curve scores of 0.64 (exp. 1) and 0.73 (exp. 2). The dictionaries, code, and the procedure for requesting the corpus are available at: https://github.com/clips/hades.
Language
English
Source (journal)
Computational Linguistics in the Netherlands Journal
Publication
2016
Volume/pages
6 :1 (2016) , p. 3-20
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
VABB-SHW
Record
Identifier
Creation 10.01.2017
Last edited 07.10.2022
To cite this reference