تقرير
Unknown Script: Impact of Script on Cross-Lingual Transfer
العنوان: | Unknown Script: Impact of Script on Cross-Lingual Transfer |
---|---|
المؤلفون: | Tufa, Wondimagegnhue Tsegaye, Markov, Ilia, Vossen, Piek |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language |
الوصف: | Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size. Comment: Paper accepted to NAACL Student Research Workshop (SRW) 2024 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2404.18810 |
رقم الأكسشن: | edsarx.2404.18810 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |