تقرير
Lumiere: A Space-Time Diffusion Model for Video Generation
العنوان: | Lumiere: A Space-Time Diffusion Model for Video Generation |
---|---|
المؤلفون: | Bar-Tal, Omer, Chefer, Hila, Tov, Omer, Herrmann, Charles, Paiss, Roni, Zada, Shiran, Ephrat, Ariel, Hur, Junhwa, Liu, Guanghui, Raj, Amit, Li, Yuanzhen, Rubinstein, Michael, Michaeli, Tomer, Wang, Oliver, Sun, Deqing, Dekel, Tali, Mosseri, Inbar |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition |
الوصف: | We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation. Comment: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2401.12945 |
رقم الأكسشن: | edsarx.2401.12945 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |