Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

التفاصيل البيبلوغرافية
العنوان:	Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
المؤلفون:	Dhawan, Kunal, Koluguri, Nithin Rao, Jukić, Ante, Langman, Ryan, Balam, Jagadeesh, Ginsburg, Boris
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف:	Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-text foundational models. In this work, we present a comprehensive analysis on building ASR systems with discrete codes. We investigate different methods for codec training such as quantization schemes and time-domain vs spectral feature encodings. We further explore ASR training techniques aimed at enhancing performance, training efficiency, and noise robustness. Drawing upon our findings, we introduce a codec ASR pipeline that outperforms Encodec at similar bit-rate. Remarkably, it also surpasses the state-of-the-art results achieved by strong self-supervised models on the 143 languages ML-SUPERB benchmark despite being smaller in size and pretrained on significantly less data. Comment: Accepted at Interspeech 2024
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.03495
رقم الأكسشن:	edsarx.2407.03495
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.