Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation

التفاصيل البيبلوغرافية
العنوان: Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation
المؤلفون: Chirkova, Nadezhda, Liang, Sheng, Nikoulina, Vassilina
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning with adapters. We find that mBART with adapters performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. We also underline the importance of tuning learning rate used for finetuning, which helps to alleviate the problem of generation in the wrong language.
Comment: This preprint describes a preliminary study for our follow-up work arXiv:2402.12279 (NAACL 2024), in which we investigate important factors for enabling zero-shot cross-lingual transfer in generative tasks
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2310.09917
رقم الأكسشن: edsarx.2310.09917
قاعدة البيانات: arXiv