An Attribute Interpolation Method in Speech Synthesis by Model Merging

التفاصيل البيبلوغرافية
العنوان: An Attribute Interpolation Method in Speech Synthesis by Model Merging
المؤلفون: Murata, Masato, Miyazaki, Koichi, Koriyama, Tomoki
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: With the development of speech synthesis, recent research has focused on challenging tasks, such as speaker generation and emotion intensity control. Attribute interpolation is a common approach to these tasks. However, most previous methods for attribute interpolation require specific modules or training methods. We propose an attribute interpolation method in speech synthesis by model merging. Model merging is a method that creates new parameters by only averaging the parameters of base models. The merged model can generate an output with an intermediate feature of the base models. This method is easily applicable without specific modules or training methods, as it uses only existing trained base models. We merged two text-to-speech models to achieve attribute interpolation and evaluated its performance on speaker generation and emotion intensity control tasks. As a result, our proposed method achieved smooth attribute interpolation while keeping the linguistic content in both tasks.
Comment: Accepted by INTERSPEECH 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.00766
رقم الأكسشن: edsarx.2407.00766
قاعدة البيانات: arXiv