كتاب إلكتروني

Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian

التفاصيل البيبلوغرافية
العنوان: Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian
المؤلفون: Schmalz, Veronica Juliana, Frey, Jennifer-Carmen, Stemle, Egon W.
المصدر: Accademia University PressOpenAIRE.
بيانات النشر: Accademia University Press.
سنة النشر: 2022
وصف مادي: 300-306
مصطلحات موضوعية: Linguistics, linguistica computazionale, linguistica, linguistique computationelle, linguistique, Computational Linguistics, Language, Linguistics
الوصف: Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP) given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners’ texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoS-tagging systems and evaluating their performance on such data. We use Italian texts from a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoS-tagging performance on non-standard language. With the 3.7 version of Stanza, a Python NLP package, we apply available automatic PoS-taggers, namely ISDT, ParTUT, POSTWITA, TWITTIRÒ and VIT, trained with diversified data, on our dataset. Our results show that the above taggers, trained on non-standard data or multilingual treebanks, can achieve up to 95% of accuracy on young multilingual learner data, if combined.
نوع الوثيقة: Chapter
اللغة: English
ردمك: 979-1-280-13694-7
Relation: http://books.openedition.org/aaccademia/basictei/10839; http://books.openedition.org/aaccademia/tei/10839
URL الوصول: http://books.openedition.org/aaccademia/10839
حقوق: CC BY-NC-ND 4.0
رقم الأكسشن: edsrev.170CCBB2
قاعدة البيانات: Openedition.org