A Study on Incorporating Whisper for Robust Speech Assessment

التفاصيل البيبلوغرافية
العنوان: A Study on Incorporating Whisper for Robust Speech Assessment
المؤلفون: Zezario, Ryandhimas E., Chen, Yu-Wen, Fu, Szu-Wei, Tsao, Yu, Wang, Hsin-Min, Fuh, Chiou-Shann
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف: This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in deploying a more robust speech assessment model. After that, we explore combining representations from Whisper and SSL models. The experimental results reveal that Whisper's embedding features can contribute to more accurate prediction performance. Moreover, combining the embedding features from Whisper and SSL models only leads to marginal improvement. As compared to intrusive methods, MOSA-Net, and other SSL-based speech assessment models, MOSA-Net+ yields notable improvements in estimating subjective quality and intelligibility scores across all evaluation metrics in Taiwan Mandarin Hearing In Noise test - Quality & Intelligibility (TMHINT-QI) dataset. To further validate its robustness, MOSA-Net+ was tested in the noisy-and-enhanced track of the VoiceMOS Challenge 2023, where it obtained the top-ranked performance among nine systems.
Comment: Accepted to IEEE ICME 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2309.12766
رقم الأكسشن: edsarx.2309.12766
قاعدة البيانات: arXiv