Publication
Title
Fine-grained classification of social science journal articles using textual data : a comparison of supervised machine learning approaches
Author
Abstract
We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.
Language
English
Source (journal)
Quantitative science studies / International Society for Scientometrics and Informetrics. - Cambridge, MA, 2020, currens
Publication
Cambridge, MA : The MIT Press , 2021
ISSN
2641-3337
DOI
10.1162/QSS_A_00106
Volume/pages
2 :1 (2021) , p. 89-110
ISI
000697445300005
Full text (Publisher's DOI)
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
VABB-SHW
Web of Science
Record
Identifier
Creation 25.01.2021
Last edited 26.08.2024
To cite this reference