Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

التفاصيل البيبلوغرافية
العنوان: Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data
المؤلفون: Moliner, Eloi, Braun, Sebastian, Gamper, Hannes
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
الوصف: Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for this problem. The presented framework addresses the transport problem across different distributions of audio signals through the implementation of a series of two deterministic probability flows. The proposed framework facilitates manipulation of the target distribution properties through a continuous control variable, which defines a certain aspect of the target domain. Notably, this approach does not rely on paired examples for training. To address identified challenges on maintaining the speech content consistent, we recommend a training strategy that incorporates chunk-based minibatch Optimal Transport couplings of data samples and noise. Comparing our unsupervised method with established baselines, we find competitive performance in tasks of reverberation and distortion manipulation. Despite encoutering limitations, the intriguing results obtained in this study underscore potential for further exploration.
Comment: Submitted to IWAENC 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.19497
رقم الأكسشن: edsarx.2405.19497
قاعدة البيانات: arXiv