تقرير
Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting
العنوان: | Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting |
---|---|
المؤلفون: | Liu, Emmy, Chaudhary, Aditi, Neubig, Graham |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language |
الوصف: | Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences. Comment: EMNLP 2023 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2310.07081 |
رقم الأكسشن: | edsarx.2310.07081 |
قاعدة البيانات: | arXiv |
كن أول من يترك تعليقا!