Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

التفاصيل البيبلوغرافية
العنوان: Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation
المؤلفون: Kohler, Jonas, Pumarola, Albert, Schönfeld, Edgar, Sanakoyeu, Artsiom, Sumbaly, Roshan, Vajda, Peter, Thabet, Ali
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality by addressing singularities in noise prediction. Through extensive experiments, we demonstrate that our method outperforms existing competitors in quantitative metrics and human evaluations. Remarkably, it achieves performance comparable to the teacher model using only three denoising steps, enabling efficient high-quality generation.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.05224
رقم الأكسشن: edsarx.2405.05224
قاعدة البيانات: arXiv