Multi-modal Discriminative Model for Vision-and-Language Navigation

التفاصيل البيبلوغرافية
العنوان: Multi-modal Discriminative Model for Vision-and-Language Navigation
المؤلفون: Jason Baldridge, Harsh Mehta, Vihan Jain, Eugene Ie, Haoshuo Huang
سنة النشر: 2019
مصطلحات موضوعية: FOS: Computer and information sciences, Discriminator, Parsing, Computer Science - Computation and Language, Computer science, Generalization, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Context (language use), Machine learning, computer.software_genre, Task (project management), Discriminative model, Benchmark (computing), Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language
الوصف: Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must have the ability to parse natural language of varying linguistic styles, ground them in potentially unfamiliar scenes, plan and react with ambiguous environmental feedback. Generalization ability is limited by the amount of human annotated data. In particular, \emph{paired} vision-language sequence data is expensive to collect. We develop a discriminator that evaluates how well an instruction explains a given path in VLN task using multi-modal alignment. Our study reveals that only a small fraction of the high-quality augmented data from \citet{Fried:2018:Speaker}, as scored by our discriminator, is useful for training VLN agents with similar performance on previously unseen environments. We also show that a VLN agent warm-started with pre-trained components from the discriminator outperforms the benchmark success rates of 35.5 by 10\% relative measure on previously unseen environments.
Accepted at SpLU-RoboNLP 2019 (workshop at NAACL)
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::beb45273c899be78d312868e509a5889
http://arxiv.org/abs/1905.13358
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....beb45273c899be78d312868e509a5889
قاعدة البيانات: OpenAIRE