Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

التفاصيل البيبلوغرافية
العنوان: Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
المؤلفون: Guo, Zilu, Du, Jun, Lee, Chin-Hui, Gao, Yu, Zhang, Wenbin
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Sound
الوصف: The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2306.08527
رقم الأكسشن: edsarx.2306.08527
قاعدة البيانات: arXiv