How to optimize your Twitter collection

التفاصيل البيبلوغرافية
العنوان: How to optimize your Twitter collection
المؤلفون: Tim Kreutz, Walter Daelemans
المصدر: University of Antwerp
Computational linguistics in the Netherlands journal
مصطلحات موضوعية: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Linguistics
الوصف: Twitter allows API calls to retrieve one percent of all tweets at any time using a search word list. Since some languages, including Dutch, make up less than one percent of all tweets on average, a large part can be retrieved using the right keywords. This paper systematically assesses keyword lists for nding language-specic tweets. It contributes comparisons to previously suggested collection methods for the Dutch language and establishes the limitations of each. Generating keywords from Dutch tweets and picking 400 based on their precision-weighted recall achieves the best coverage at 91.3%. The list of Dutch keywords is made openly available alongside the code that can be used to generate lists for the collection of other languages or for other tasks that benet from early ltering such as event or hate speech detection.
وصف الملف: pdf
تدمد: 2211-4009
URL الوصول: https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::8c50ea3ad62d07119aea02b654239908
https://hdl.handle.net/10067/1661910151162165141
حقوق: OPEN
رقم الأكسشن: edsair.dedup.wf.001..8c50ea3ad62d07119aea02b654239908
قاعدة البيانات: OpenAIRE