تقرير
Smooth Non-Stationary Bandits
العنوان: | Smooth Non-Stationary Bandits |
---|---|
المؤلفون: | Jia, Su, Xie, Qian, Kallus, Nathan, Frazier, Peter I. |
سنة النشر: | 2023 |
المجموعة: | Computer Science Mathematics Statistics |
مصطلحات موضوعية: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control, Mathematics - Statistics Theory |
الوصف: | In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$. Comment: Accepted by ICML 2023 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2301.12366 |
رقم الأكسشن: | edsarx.2301.12366 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |