Title
|
|
|
|
OSACT4 shared tasks : ensembled stacked classification for offensive and hate speech in Arabic tweets
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
In this paper, we describe our submission for the OCAST4 2020 shared tasks on offensive language and hate speech detection in the Arabic language. Our solution builds upon combining a number of deep learning models using pre-trained word vectors. To improve the word representation and increase word coverage, we compare a number of existing pre-trained word embeddings and finally concatenate the two empirically best among them. To avoid under- as well as over-fitting, we train each deep model multiple times, and we include the optimization of the decision threshold into the training process. The predictions of the resulting models are then combined into a tuned ensemble by stacking a classifier on top of the predictions by these base models. We name our approach “ESOTP” (Ensembled Stacking classifier over Optimized Thresholded Predictions of multiple deep models). The resulting ESOTP-based system ranked 6th out of 35 on the shared task of Offensive Language detection (sub-task A) and 5th out of 30 on Hate Speech Detection (sub-task B). |
| |
Language
|
|
|
|
English
| |
Source (book)
|
|
|
|
4th Workshop on Open-Source Arabic Corpora and Processing Tools
| |
Publication
|
|
|
|
European Language Resource Association
,
2020
| |
Volume/pages
|
|
|
|
(2020)
, p. 71-75
| |
Note
|
|
|
|
Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020
| |
Full text (open access)
|
|
|
|
| |
|