Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

التفاصيل البيبلوغرافية
العنوان: Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
المؤلفون: Hasan, Md Zahid, Chen, Jiajing, Wang, Jiyang, Rahman, Mohammed Shaiqur, Joshi, Ameya, Velipasalar, Senem, Hegde, Chinmay, Sharma, Anuj, Sarkar, Soumik
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from driving video data. Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets. We propose both frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification tasks and report the results.
Comment: 15 pages, 7 figures
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2306.10159
رقم الأكسشن: edsarx.2306.10159
قاعدة البيانات: arXiv