Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

التفاصيل البيبلوغرافية
العنوان: Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
المؤلفون: Plizzari, Chiara, Goel, Shubham, Perrett, Toby, Chalk, Jacob, Kanazawa, Angjoo, Damen, Dima
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We introduce Lift, Match and Keep (LMK), a method which lifts partial 2D observations to 3D world coordinates, matches them over time using visual appearance, 3D location and interactions to form object tracks, and keeps these object tracks even when they go out-of-view of the camera - hence keeping in mind what is out of sight. We test LMK on 100 long videos from EPIC-KITCHENS. Our results demonstrate that spatial cognition is critical for correctly locating objects over short and long time scales. E.g., for one long egocentric video, we estimate the 3D location of 50 active objects. Of these, 60% can be correctly positioned in 3D after 2 minutes of leaving the camera view.
Comment: 21 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2404.05072
رقم الأكسشن: edsarx.2404.05072
قاعدة البيانات: arXiv