When is exponential asymptotic optimality achievable in average-reward restless bandits?

التفاصيل البيبلوغرافية
العنوان: When is exponential asymptotic optimality achievable in average-reward restless bandits?
المؤلفون: Hong, Yige, Xie, Qiaomin, Chen, Yudong, Wang, Weina
سنة النشر: 2024
المجموعة: Computer Science
Mathematics
مصطلحات موضوعية: Computer Science - Machine Learning, Mathematics - Optimization and Control, Mathematics - Probability, 90C40, G.3, I.6
الوصف: We consider the discrete-time infinite-horizon average-reward restless bandit problem. We propose a novel policy that maintains two dynamic subsets of arms: one subset of arms has a nearly optimal state distribution and takes actions according to an Optimal Local Control routine; the other subset of arms is driven towards the optimal state distribution and gradually merged into the first subset. We show that our policy is asymptotically optimal with an $O(\exp(-C N))$ optimality gap for an $N$-armed problem, under the mild assumptions of aperiodic-unichain, non-degeneracy, and local stability. Our policy is the first to achieve exponential asymptotic optimality under the above set of easy-to-verify assumptions, whereas prior work either requires a strong Global Attractor assumption or only achieves an $O(1/\sqrt{N})$ optimality gap. We further discuss the fundamental obstacles in significantly weakening our assumptions. In particular, we prove a lower bound showing that local stability is fundamental for exponential asymptotic optimality.
Comment: 46 pages, 1 figure
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.17882
رقم الأكسشن: edsarx.2405.17882
قاعدة البيانات: arXiv