Streaming Video Diffusion: Online Video Editing with Diffusion Models

التفاصيل البيبلوغرافية
العنوان:	Streaming Video Diffusion: Online Video Editing with Diffusion Models
المؤلفون:	Chen, Feng, Yang, Zhen, Zhuang, Bohan, Wu, Qi
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computer Vision and Pattern Recognition
الوصف:	We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible, online video editing is tailored to real-life applications such as live streaming and online chat, requiring (1) fast continual step inference, (2) long-term temporal modeling, and (3) zero-shot video editing capability. To solve these issues, we propose Streaming Video Diffusion (SVDiff), which incorporates the compact spatial-aware temporal recurrence into off-the-shelf Stable Diffusion and is trained with the segment-level scheme on large-scale long videos. This simple yet effective setup allows us to obtain a single model that is capable of executing a broad range of videos and editing each streaming frame with temporal coherence. Our experiments indicate that our model can edit long, high-quality videos with remarkable results, achieving a real-time inference speed of 15.2 FPS at a resolution of 512x512.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2405.19726
رقم الأكسشن:	edsarx.2405.19726
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.