Lightweight Language-driven Grasp Detection using Conditional Consistency Model

التفاصيل البيبلوغرافية
العنوان: Lightweight Language-driven Grasp Detection using Conditional Consistency Model
المؤلفون: Nguyen, Nghia, Vu, Minh Nhat, Huang, Baoru, Vuong, An, Le, Ngan, Vo, Thieu, Nguyen, Anh
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
الوصف: Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.
Comment: Accepted at IROS 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.17967
رقم الأكسشن: edsarx.2407.17967
قاعدة البيانات: arXiv