SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

التفاصيل البيبلوغرافية
العنوان:	SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
المؤلفون:	Tang, Yuxun, Wu, Yuning, Shi, Jiatong, Jin, Qin
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف:	Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis. Comment: Accepted by Interspeech 2024
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2406.08905
رقم الأكسشن:	edsarx.2406.08905
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.