Dynamically Composing Domain-Data Selection with Clean-Data Selection by 'Co-Curricular Learning' for Neural Machine Translation

التفاصيل البيبلوغرافية
العنوان: Dynamically Composing Domain-Data Selection with Clean-Data Selection by 'Co-Curricular Learning' for Neural Machine Translation
المؤلفون: Wang, Wei, Caswell, Isaac, Chelba, Ciprian
المصدر: The 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)
سنة النشر: 2019
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف: Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domain-data selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not explicitly examined. This paper introduces a "co-curricular learning" method to compose dynamic domain-data selection with dynamic clean-data selection, for transfer learning across both capabilities. We apply an EM-style optimization procedure to further refine the "co-curriculum". Experiment results and analysis with two domains demonstrate the effectiveness of the method and the properties of data scheduled by the co-curriculum.
Comment: 11 pages
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/1906.01130
رقم الأكسشن: edsarx.1906.01130
قاعدة البيانات: arXiv