Mimicking Better by Matching the Approximate Action Distribution

التفاصيل البيبلوغرافية
العنوان: Mimicking Better by Matching the Approximate Action Distribution
المؤلفون: Ramos, João A. Cândido, Blondé, Lionel, Takeishi, Naoya, Kalousis, Alexandros
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning
الوصف: In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert's state-state transitions; we regularize the imitator's policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2306.09805
رقم الأكسشن: edsarx.2306.09805
قاعدة البيانات: arXiv