تقرير
On The Open Prompt Challenge In Conditional Audio Generation
العنوان: | On The Open Prompt Challenge In Conditional Audio Generation |
---|---|
المؤلفون: | Chang, Ernie, Srinivasan, Sidd, Luthra, Mahi, Lin, Pin-Jie, Nagaraja, Varun, Iandola, Forrest, Liu, Zechun, Ni, Zhaoheng, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing |
الوصف: | Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two key insights: (1) User prompts are generally under-specified, leading to a large alignment gap between user prompts and training prompts. (2) There is a distribution of audio descriptions for which TTA models are better at generating higher quality audio, which we refer to as ``audionese''. To this end, we rewrite prompts with instruction-tuned models and propose utilizing text-audio alignment as feedback signals via margin ranking learning for audio improvements. On both objective and subjective human evaluations, we observed marked improvements in both text-audio alignment and music audio quality. Comment: 5 pages, 3 figures, 4 tables |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2311.00897 |
رقم الأكسشن: | edsarx.2311.00897 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |