ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

التفاصيل البيبلوغرافية
العنوان:	ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
المؤلفون:	Huang, Jiannan, Liew, Jun Hao, Yan, Hanshu, Yin, Yuyang, Zhao, Yao, Wei, Yunchao
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computer Vision and Pattern Recognition
الوصف:	Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a dog wearing a headphone'). Interestingly, we notice that the base model before fine-tuning exhibits the capability to compose the base concept with other elements (e.g. a dog wearing a headphone) implying that the compositional ability only disappears after personalization tuning. Inspired by this observation, we present ClassDiffusion, a simple technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept. Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts. Extensive qualitative and quantitative experiments demonstrate that the use of semantic preservation loss effectively improves the compositional abilities of the fine-tune models. In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric, a more equitable and effective evaluation metric for this particular domain. We also provide in-depth empirical study and theoretical analysis to better understand the role of the proposed loss. Lastly, we also extend our ClassDiffusion to personalized video generation, demonstrating its flexibility.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2405.17532
رقم الأكسشن:	edsarx.2405.17532
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.