تقرير
Language Reward Modulation for Pretraining Reinforcement Learning
العنوان: | Language Reward Modulation for Pretraining Reinforcement Learning |
---|---|
المؤلفون: | Adeniji, Ademi, Xie, Amber, Sferrazza, Carmelo, Seo, Younggyo, James, Stephen, Abbeel, Pieter |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence |
الوصف: | Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench. Comment: Code available at https://github.com/ademiadeniji/lamp |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2308.12270 |
رقم الأكسشن: | edsarx.2308.12270 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |