تقرير
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
العنوان: | Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints |
---|---|
المؤلفون: | Lee, PeiYing, Guo, HauYun, Chen, Berlin |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound |
الوصف: | End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility. Comment: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2403.14268 |
رقم الأكسشن: | edsarx.2403.14268 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |