An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

التفاصيل البيبلوغرافية
العنوان: An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
المؤلفون: Lee, Jihwan, Bae, Jae-Sung, Mun, Seongkyu, Choi, Heejin, Lee, Joun Yeop, Cho, Hoon-Young, Kim, Chanwoo
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف: With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply the vowel space analysis method to explore L2 accents of cross-lingual TTS systems. Through the vowel space analysis, we observe the three followings: a) a parallel architecture (Glow-TTS) is less L2-accented than an auto-regressive one (Tacotron); b) L2 accents are more dominant in non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS systems share some phenomena with those of human L2 learners. Our findings imply that it is necessary for TTS systems to handle each language pair differently, depending on their linguistic characteristics such as non-shared vowels. They also hint that we can further incorporate linguistics knowledge in developing cross-lingual TTS systems.
Comment: Submitted to ICASSP 2023
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2211.03078
رقم الأكسشن: edsarx.2211.03078
قاعدة البيانات: arXiv