Publication
Title
OSACT4 shared tasks : ensembled stacked classification for offensive and hate speech in Arabic tweets
Author
Abstract
In this paper, we describe our submission for the OCAST4 2020 shared tasks on offensive language and hate speech detection in the Arabic language. Our solution builds upon combining a number of deep learning models using pre-trained word vectors. To improve the word representation and increase word coverage, we compare a number of existing pre-trained word embeddings and finally concatenate the two empirically best among them. To avoid under- as well as over-fitting, we train each deep model multiple times, and we include the optimization of the decision threshold into the training process. The predictions of the resulting models are then combined into a tuned ensemble by stacking a classifier on top of the predictions by these base models. We name our approach “ESOTP” (Ensembled Stacking classifier over Optimized Thresholded Predictions of multiple deep models). The resulting ESOTP-based system ranked 6th out of 35 on the shared task of Offensive Language detection (sub-task A) and 5th out of 30 on Hate Speech Detection (sub-task B).
Language
English
Source (book)
4th Workshop on Open-Source Arabic Corpora and Processing Tools
Publication
European Language Resource Association , 2020
Volume/pages
(2020) , p. 71-75
Note
Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Source file
Record
Identifier
Creation 05.08.2021
Last edited 17.06.2024
To cite this reference