Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

التفاصيل البيبلوغرافية
العنوان: Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
المؤلفون: Soleymanpour, Mohammad, Ismail, Mahmoud Al, Bahmaninezhad, Fahimeh, Kumar, Kshitiz, Wu, Jian
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
الوصف: We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing capability. In particular, the bilingual IT model improves the word error rate (WER) for a code-mix IT task from 46.5% to 13.8%, while also achieving a close parity (9.6%) with the monolingual IT model (9.5%) over IT tests.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2308.06327
رقم الأكسشن: edsarx.2308.06327
قاعدة البيانات: arXiv