WHAM!: Extending Speech Separation to Noisy Environments

التفاصيل البيبلوغرافية
العنوان: WHAM!: Extending Speech Separation to Noisy Environments
المؤلفون: Jonathan Le Roux, Michael Flynn, Joseph M. Antognini, Dwight Crow, Emmett McQuinn, Gordon Wichern, Ethan Manilow, Licheng Richard Zhu
المصدر: INTERSPEECH
بيانات النشر: arXiv, 2019.
سنة النشر: 2019
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Computation and Language, Computer science, Speech recognition, Ambient noise level, Machine Learning (stat.ML), Cocktail party effect, Computer Science - Sound, Machine Learning (cs.LG), Background noise, Noise, Sampling (signal processing), Statistics - Machine Learning, Robustness (computer science), Audio and Speech Processing (eess.AS), Benchmark (computing), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing, Communication channel
الوصف: Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
Comment: Accepted for publication at Interspeech 2019
DOI: 10.48550/arxiv.1907.01160
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4acf90dac562c7d9777fd199409e990e
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....4acf90dac562c7d9777fd199409e990e
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.48550/arxiv.1907.01160