تقرير
Speaker detection in the wild: Lessons learned from JSALT 2019
العنوان: | Speaker detection in the wild: Lessons learned from JSALT 2019 |
---|---|
المؤلفون: | Garcia, Paola, Villalba, Jesus, Bredin, Herve, Du, Jun, Castan, Diego, Cristia, Alejandrina, Bullock, Latane, Guo, Ling, Okabe, Koji, Nidadavolu, Phani Sankar, Kataria, Saurabh, Chen, Sizhu, Galmant, Leo, Lavechin, Marvin, Sun, Lei, Gill, Marie-Philippe, Ben-Yair, Bar, Abdoli, Sajjad, Wang, Xin, Bouaziz, Wassim, Titeux, Hadrien, Dupoux, Emmanuel, Lee, Kong Aik, Dehak, Najim |
سنة النشر: | 2019 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound |
الوصف: | This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker detection; but our first finding was that an effective diarization improves detection, and not having a diarization stage impoverishes the performance. All the different configurations of our research agree on this fact and follow a main backbone that includes diarization as a previous stage. With this backbone, we analyzed the following problems: voice activity detection, how to deal with noisy signals, domain mismatch, how to improve the clustering; and the overall impact of previous stages in the final speaker detection. In this paper, we show partial results for speaker diarizarion to have a better understanding of the problem and we present the final results for speaker detection. Comment: Submitted to ICASSP 2020 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/1912.00938 |
رقم الأكسشن: | edsarx.1912.00938 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |