تقرير
An Adiabatic Theorem for Policy Tracking with TD-learning
العنوان: | An Adiabatic Theorem for Policy Tracking with TD-learning |
---|---|
المؤلفون: | Walton, Neil |
سنة النشر: | 2020 |
المجموعة: | Computer Science Mathematics |
مصطلحات موضوعية: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Probability |
الوصف: | We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time. Our results apply a new adiabatic theorem that bounds the mixing time of time-inhomogeneous Markov chains. We derive finite-time bounds for tabular temporal difference learning and $Q$-learning when the policy used for training changes in time. To achieve this, we develop bounds for stochastic approximation under asynchronous adiabatic updates. |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2010.12848 |
رقم الأكسشن: | edsarx.2010.12848 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |