Neural Target Speech Extraction: An overview
العنوان: | Neural Target Speech Extraction: An overview |
---|---|
المؤلفون: | Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu |
المصدر: | IEEE Signal Processing Magazine. 40:8-29 |
بيانات النشر: | Institute of Electrical and Electronics Engineers (IEEE), 2023. |
سنة النشر: | 2023 |
مصطلحات موضوعية: | FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), Applied Mathematics, Signal Processing, FOS: Electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing |
الوصف: | Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail-party effect. For decades, researchers have focused on approaching the listening ability of humans. One critical issue is handling interfering speakers because the target and non-target speech signals share similar characteristics, complicating their discrimination. Target speech/speaker extraction (TSE) isolates the speech signal of a target speaker from a mixture of several speakers with or without noises and reverberations using clues that identify the speaker in the mixture. Such clues might be a spatial clue indicating the direction of the target speaker, a video of the speaker's lips, or a pre-recorded enrollment utterance from which their voice characteristics can be derived. TSE is an emerging field of research that has received increased attention in recent years because it offers a practical approach to the cocktail-party problem and involves such aspects of signal processing as audio, visual, array processing, and deep learning. This paper focuses on recent neural-based approaches and presents an in-depth overview of TSE. We guide readers through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions. Comment: Submitted to IEEE Signal Processing Magazine on Apr. 25, 2022, and accepted on Jan. 12, 2023 |
تدمد: | 1558-0792 1053-5888 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_dedup___::456a9f22ea9ed0377fadd796e1dc6b3f https://doi.org/10.1109/msp.2023.3240008 |
حقوق: | OPEN |
رقم الأكسشن: | edsair.doi.dedup.....456a9f22ea9ed0377fadd796e1dc6b3f |
قاعدة البيانات: | OpenAIRE |
تدمد: | 15580792 10535888 |
---|