ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

التفاصيل البيبلوغرافية
العنوان: ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
المؤلفون: Huang, Jiannan, Liew, Jun Hao, Yan, Hanshu, Yin, Yuyang, Zhao, Yao, Wei, Yunchao
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a dog wearing a headphone'). Interestingly, we notice that the base model before fine-tuning exhibits the capability to compose the base concept with other elements (e.g. a dog wearing a headphone) implying that the compositional ability only disappears after personalization tuning. Inspired by this observation, we present ClassDiffusion, a simple technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept. Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts. Extensive qualitative and quantitative experiments demonstrate that the use of semantic preservation loss effectively improves the compositional abilities of the fine-tune models. In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric, a more equitable and effective evaluation metric for this particular domain. We also provide in-depth empirical study and theoretical analysis to better understand the role of the proposed loss. Lastly, we also extend our ClassDiffusion to personalized video generation, demonstrating its flexibility.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.17532
رقم الأكسشن: edsarx.2405.17532
قاعدة البيانات: arXiv