An Active Exploration Method for Data Efficient Reinforcement Learning

التفاصيل البيبلوغرافية
العنوان:	An Active Exploration Method for Data Efficient Reinforcement Learning
المؤلفون:	Zhao Dongfang, Liu Jiafeng, Wu Rui, Cheng Dansong, Tang Xianglong
المصدر:	International Journal of Applied Mathematics and Computer Science, Vol 29, Iss 2, Pp 351-362 (2019)
بيانات النشر:	Sciendo, 2019.
سنة النشر:	2019
المجموعة:	LCC:Mathematics LCC:Electronic computers. Computer science
مصطلحات موضوعية:	reinforcement learning, information entropy, pilco, data efficiency, Mathematics, QA1-939, Electronic computers. Computer science, QA75.5-76.95
الوصف:	Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	2083-8492 2019-0026
Relation:	https://doaj.org/toc/2083-8492
DOI:	10.2478/amcs-2019-0026
URL الوصول:	https://doaj.org/article/f6dc2b7911544c01b01453a6d6a5e207
رقم الأكسشن:	edsdoj.f6dc2b7911544c01b01453a6d6a5e207
قاعدة البيانات:	Directory of Open Access Journals

Find this article in full text from ProQuest

Full Text Finder

الوصف
تدمد:	20838492 20190026
DOI:	10.2478/amcs-2019-0026