تقرير
Exploring the Capability of Mamba in Speech Applications
العنوان: | Exploring the Capability of Mamba in Speech Applications |
---|---|
المؤلفون: | Miyazaki, Koichi, Masuyama, Yoshiki, Murata, Masato |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing |
الوصف: | This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based models across a wide range of speech tasks. In contrast, the evaluation of SSMs has been limited to a few tasks, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we compared Mamba with state-of-the-art Transformer variants for various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. Experimental evaluations revealed that Mamba achieves comparable or better performance than Transformer-based models, and demonstrated its efficiency in long-form speech processing. Comment: Accepted at Interspeech 2024 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2406.16808 |
رقم الأكسشن: | edsarx.2406.16808 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |