Frequency & Channel Attention for Computationally Efficient Sound Event Detection

التفاصيل البيبلوغرافية
العنوان: Frequency & Channel Attention for Computationally Efficient Sound Event Detection
المؤلفون: Nam, Hyeonuk, Kim, Seong-Hu, Min, Deokki, Park, Yong-Hwa
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue associated with 2D convolution on the frequency dimension of 2D audio data. Although this approach demonstrated state-of-the-art SED performance, it resulted in a model with 150% more trainable parameters. To achieve comparable SED performance with computationally efficient methods for practicality, we explore on lighter alternative attention methods. In addition, we focus on attention methods applied to frequency and channel dimensions. Joint application Squeeze-and-excitation (SE) module and time-frame frequency-wise SE (tfwSE) to apply attention on both frequency and channel dimensions shows comparable performance to SED model with FDY conv with only 2.7% more trainable parameters compared to the baseline model. In addition, we performed class-wise comparison of various attention methods to further discuss various attention methods' characteristics.
Comment: Accepted to DCASE 2023 workshop
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2306.11277
رقم الأكسشن: edsarx.2306.11277
قاعدة البيانات: arXiv