تقرير
L4GM: Large 4D Gaussian Reconstruction Model
العنوان: | L4GM: Large 4D Gaussian Reconstruction Model |
---|---|
المؤلفون: | Ren, Jiawei, Xie, Kevin, Mirzaei, Ashkan, Liang, Hanxue, Zeng, Xiaohui, Kreis, Karsten, Liu, Ziwei, Torralba, Antonio, Fidler, Sanja, Kim, Seung Wook, Ling, Huan |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning |
الوصف: | We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets. Comment: Project page: https://research.nvidia.com/labs/toronto-ai/l4gm |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2406.10324 |
رقم الأكسشن: | edsarx.2406.10324 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |