MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

التفاصيل البيبلوغرافية
العنوان:	MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
المؤلفون:	Zezario, Ryandhimas E., Fu, Szu-wei, Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu
سنة النشر:	2022
المجموعة:	Computer Science
مصطلحات موضوعية:	Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
الوصف:	Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores. Comment: Accepted to Interspeech 2022
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2204.03310
رقم الأكسشن:	edsarx.2204.03310
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.