SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation

التفاصيل البيبلوغرافية
العنوان: SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation
المؤلفون: Reed, Alec, Crowe, Brendan, Albin, Doncey, Achey, Lorin, Hayes, Bradley, Heckman, Christoffer
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Robotics
الوصف: When exploring new areas, robotic systems generally exclusively plan and execute controls over geometry that has been directly measured. When entering space that was previously obstructed from view such as turning corners in hallways or entering new rooms, robots often pause to plan over the newly observed space. To address this we present SceneScene, a real-time 3D diffusion model for synthesizing 3D occupancy information from partial observations that effectively predicts these occluded or out of view geometries for use in future planning and control frameworks. SceneSense uses a running occupancy map and a single RGB-D camera to generate predicted geometry around the platform at runtime, even when the geometry is occluded or out of view. Our architecture ensures that SceneSense never overwrites observed free or occupied space. By preserving the integrity of the observed map, SceneSense mitigates the risk of corrupting the observed space with generative predictions. While SceneSense is shown to operate well using a single RGB-D camera, the framework is flexible enough to extend to additional modalities. SceneSense operates as part of any system that generates a running occupancy map `out of the box', removing conditioning from the framework. Alternatively, for maximum performance in new modalities, the perception backbone can be replaced and the model retrained for inference in new applications. Unlike existing models that necessitate multiple views and offline scene synthesis, or are focused on filling gaps in observed data, our findings demonstrate that SceneSense is an effective approach to estimating unobserved local occupancy information at runtime. Local occupancy predictions from SceneSense are shown to better represent the ground truth occupancy distribution during the test exploration trajectories than the running occupancy map.
Comment: 8 pages, 6 figures
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.11985
رقم الأكسشن: edsarx.2403.11985
قاعدة البيانات: arXiv