Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

التفاصيل البيبلوغرافية
العنوان: Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
المؤلفون: Nam, Hyeonuk, Kim, Seong-Hu, Min, Deokki, Lee, Junhyeok, Park, Yong-Hwa
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patterns span larger spectro-temporal range. Therefore, we propose dilated frequency dynamic convolution (DFD conv) which diversifies and expands frequency-adaptive kernels by introducing different dilation sizes to basis kernels. Experiments showed advantages of varying dilation sizes along frequency dimension, and analysis on attention weight variance proved dilated basis kernels are effectively diversified. By adapting class-wise median filter with intersection-based F1 score, proposed DFD-CRNN outperforms FDY-CRNN by 3.12% in terms of polyphonic sound detection score (PSDS).
Comment: Accepted to INTERSPEECH 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.05341
رقم الأكسشن: edsarx.2406.05341
قاعدة البيانات: arXiv