Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

التفاصيل البيبلوغرافية
العنوان: Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
المؤلفون: Nam, Hyeonuk, Park, Yong-Hwa
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف: Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates static convolution output and dynamic FDY conv output in order to minimize model size increase while maintaining the performance. Additionally, we propose multi-dilated frequency dynamic convolution (MDFD conv), which integrates multiple dilated frequency dynamic convolution (DFD conv) branches with different dilation size sets and a static branch within a single convolution module, achieving a 3.17% improvement in polyphonic sound detection score (PSDS) over FDY conv. Proposed methods with extensive ablation studies further enhance understanding and usability of FDY conv variants.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.13312
رقم الأكسشن: edsarx.2406.13312
قاعدة البيانات: arXiv