VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

التفاصيل البيبلوغرافية
العنوان:	VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation
المؤلفون:	Wang, Weiyao, Lei, Yutian, Jin, Shiyu, Hager, Gregory D., Zhang, Liangjun
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Robotics
الوصف:	In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2403.11461
رقم الأكسشن:	edsarx.2403.11461
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.