Exploring classifier combinations for language variety identification

التفاصيل البيبلوغرافية
العنوان: Exploring classifier combinations for language variety identification
المؤلفون: Tim Kreutz, Walter Daelemans
المصدر: University of Antwerp
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), Santa Fe, New Mexico, USA
مصطلحات موضوعية: Computer. Automation, ComputingMethodologies_PATTERNRECOGNITION, Linguistics
الوصف: This paper describes CLiPS’s submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018. We explore different ways to combine classifiers trained on different feature groups. Our best system uses two Linear SVM classifiers; one trained on lexical features (word n-grams) and one trained on syntactic features (PoS n-grams). The final prediction for a document to be in Flemish Dutch or Netherlandic Dutch is made by the classifier that outputs the highest probability for one of the two labels. This confidence vote approach outperforms a meta-classifier on the development data and on the test data.
ردمك: 978-1-945626-43-2
URL الوصول: https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::52c1cb60fceacef495e43b7d644f059b
https://hdl.handle.net/10067/1565880151162165141
حقوق: OPEN
رقم الأكسشن: edsair.dedup.wf.001..52c1cb60fceacef495e43b7d644f059b
قاعدة البيانات: OpenAIRE