Mimicking Better by Matching the Approximate Action Distribution

التفاصيل البيبلوغرافية
العنوان:	Mimicking Better by Matching the Approximate Action Distribution
المؤلفون:	Ramos, João A. Cândido, Blondé, Lionel, Takeishi, Naoya, Kalousis, Alexandros
سنة النشر:	2023
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning
الوصف:	In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert's state-state transitions; we regularize the imitator's policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2306.09805
رقم الأكسشن:	edsarx.2306.09805
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.