GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

التفاصيل البيبلوغرافية
العنوان: GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
المؤلفون: Yan, Hui, Lei, Zhenchun, Liu, Changhong, Zhou, Yong
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model parameters. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The Experimental results show that the proposed GMM-ResNext achieves relative improvements of 48.1\% and 11.3\% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.
نوع الوثيقة: Working Paper
DOI: 10.1109/ICASSP48485.2024.10447141
URL الوصول: http://arxiv.org/abs/2407.03135
رقم الأكسشن: edsarx.2407.03135
قاعدة البيانات: arXiv
الوصف
DOI:10.1109/ICASSP48485.2024.10447141