Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

التفاصيل البيبلوغرافية
العنوان: Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits
المؤلفون: Hong, Yige, Xie, Qiaomin, Chen, Yudong, Wang, Weina
سنة النشر: 2024
المجموعة: Computer Science
Mathematics
مصطلحات موضوعية: Computer Science - Machine Learning, Mathematics - Optimization and Control, Mathematics - Probability, 90C40, G.3, I.6
الوصف: We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, provided that the single-armed MDP is unichain and aperiodic under the optimal single-armed policy. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Uniform Global Attractor Property (UGAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).
Comment: 49 pages, 3 figures. This version adds details on the unichain condition, stationary distribution, and long-run time average
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.05689
رقم الأكسشن: edsarx.2402.05689
قاعدة البيانات: arXiv