An Adiabatic Theorem for Policy Tracking with TD-learning

التفاصيل البيبلوغرافية
العنوان:	An Adiabatic Theorem for Policy Tracking with TD-learning
المؤلفون:	Walton, Neil
سنة النشر:	2020
المجموعة:	Computer Science Mathematics
مصطلحات موضوعية:	Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Probability
الوصف:	We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time. Our results apply a new adiabatic theorem that bounds the mixing time of time-inhomogeneous Markov chains. We derive finite-time bounds for tabular temporal difference learning and $Q$-learning when the policy used for training changes in time. To achieve this, we develop bounds for stochastic approximation under asynchronous adiabatic updates.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2010.12848
رقم الأكسشن:	edsarx.2010.12848
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.