Audio-Visual Traffic Light State Detection for Urban Robots

التفاصيل البيبلوغرافية
العنوان: Audio-Visual Traffic Light State Detection for Urban Robots
المؤلفون: Gupta, Sagar, Cosgun, Akansel
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Robotics
الوصف: We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors. The fusion method aggregates features across multiple frames in a given timeframe, increasing robustness and adaptability. Results show that our approach effectively addresses the challenge of visual occlusion and surpasses the performance of single-modality solutions when the robot is in motion. This study serves as a proof of concept, highlighting the significant, yet often overlooked, potential of multi-modal perception in robotics.
Comment: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2404.19281
رقم الأكسشن: edsarx.2404.19281
قاعدة البيانات: arXiv