Regressing Transformers for Data-efficient Visual Place Recognition

التفاصيل البيبلوغرافية
العنوان: Regressing Transformers for Data-efficient Visual Place Recognition
المؤلفون: Leyva-Vallina, María, Strisciuglio, Nicola, Petkov, Nicolai
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
الوصف: Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.
Comment: Accepted for publication in ICRA 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2401.16304
رقم الأكسشن: edsarx.2401.16304
قاعدة البيانات: arXiv