Publication
Title
How to optimize your Twitter collection
Author
Abstract
Twitter allows API calls to retrieve one percent of all tweets at any time using a search word list. Since some languages, including Dutch, make up less than one percent of all tweets on average, a large part can be retrieved using the right keywords. This paper systematically assesses keyword lists for nding language-specic tweets. It contributes comparisons to previously suggested collection methods for the Dutch language and establishes the limitations of each. Generating keywords from Dutch tweets and picking 400 based on their precision-weighted recall achieves the best coverage at 91.3%. The list of Dutch keywords is made openly available alongside the code that can be used to generate lists for the collection of other languages or for other tasks that benet from early ltering such as event or hate speech detection.
Language
English
Source (journal)
Computational linguistics in the Netherlands journal
Publication
2019
Volume/pages
9 (2019) , p. 55-66
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Project info
How political news affects and is affected by citizens in the social media age. Theoretical challenges and empirical opportunities
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
VABB-SHW
Record
Identifier
Creation 25.02.2020
Last edited 07.10.2022
To cite this reference