تقرير
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
العنوان: | DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models |
---|---|
المؤلفون: | Tian, Xiaoyu, Gu, Junru, Li, Bailin, Liu, Yicheng, Wang, Yang, Zhao, Zhiyong, Zhan, Kun, Jia, Peng, Lang, Xianpeng, Zhao, Hang |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition |
الوصف: | A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the DriveVLM-Dual on a production vehicle, verifying it is effective in real-world autonomous driving environments. Comment: Project Page: https://tsinghua-mars-lab.github.io/DriveVLM/ |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2402.12289 |
رقم الأكسشن: | edsarx.2402.12289 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |