Cross-dialect lexicon optimisation for an endangered language ASR system: the case of Irish

التفاصيل البيبلوغرافية
العنوان: Cross-dialect lexicon optimisation for an endangered language ASR system: the case of Irish
المؤلفون: Lonergan, L, Qian, M, Chiaráin, NN, Gobl, C, Chasaide, AN
المصدر: Interspeech 2022.
بيانات النشر: ISCA, 2022.
سنة النشر: 2022
مصطلحات موضوعية: minority language, Irish, speech recognition, lexicon, cross-dialect variation
الوصف: Lexicon optimisation strategies, addressing the problem of dialect divergence, are tested in an ASR system for Irish. As in many endangered languages, Irish has no spoken standard, but rather, three very different dialects of Ulster (Ul), Connaught (Co) and Munster (Mu). Furthermore, the complex sound system and ancient, opaque writing system result in sound-to-grapheme mappings that differ considerably across dialects. A hybrid ASR system was trained on (predominantly) native speaker speech data, balanced across the dialects. Experiment 1 tested whether a Global lexicon, which captures dialect variant forms with relatively abstract representations, can perform as well as a Multi-dialect lexicon containing all dialect variants. Three dialect-specific lexicons were also included in the tests. The Global lexicon did yield the best performance and experiment 2 tested whether further reductions to its phoneset might further enhance its performance. These included (i) merging a Tense-Lax contrast among coronal sonorants, not common to all dialects, and (ii) merging the contrast of voiceless-voiced sonorants, as the voiceless member is relatively infrequent. Results showed but a slight enhancement and only for Mu dialect, which is the one most aligned to the phoneset reduction.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4b3dba0dafcaad89c9f507a4482f718c
https://doi.org/10.21437/interspeech.2022-838
رقم الأكسشن: edsair.doi.dedup.....4b3dba0dafcaad89c9f507a4482f718c
قاعدة البيانات: OpenAIRE