PlacidDreamer: Advancing Harmony in Text-to-3D Generation

التفاصيل البيبلوغرافية
العنوان: PlacidDreamer: Advancing Harmony in Text-to-3D Generation
المؤلفون: Huang, Shuo, Sun, Shikun, Wang, Zixuan, Qin, Xiaoyu, Xiong, Yanmin, Zhang, Yuan, Wan, Pengfei, Zhang, Di, Jia, Jia
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, I.4.0
الوصف: Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \url{https://github.com/HansenHuang0823/PlacidDreamer}.
Comment: Accepted by ACM Multimedia 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.13976
رقم الأكسشن: edsarx.2407.13976
قاعدة البيانات: arXiv