Audio Conditioning for Music Generation via Discrete Bottleneck Features

التفاصيل البيبلوغرافية
العنوان:	Audio Conditioning for Music Generation via Discrete Bottleneck Features
المؤلفون:	Rouard, Simon, Adi, Yossi, Copet, Jade, Roebel, Axel, Défossez, Alexandre
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف:	While most music generation models use textual or parametric conditioning (e.g. tempo, harmony, musical genre), we propose to condition a language model based music generation system with audio input. Our exploration involves two distinct strategies. The first strategy, termed textual inversion, leverages a pre-trained text-to-music model to map audio input to corresponding "pseudowords" in the textual embedding space. For the second model we train a music language model from scratch jointly with a text conditioner and a quantized audio feature extractor. At inference time, we can mix textual and audio conditioning and balance them thanks to a novel double classifier free guidance method. We conduct automatic and human studies that validates our approach. We will release the code and we provide music samples on https://musicgenstyle.github.io in order to show the quality of our model. Comment: 6 pages, 2 figures, accepted at ISMIR 2024
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.12563
رقم الأكسشن:	edsarx.2407.12563
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.