Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

التفاصيل البيبلوغرافية
العنوان: Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits
المؤلفون: Zhou, Julien, Gaillard, Pierre, Rahier, Thibaud, Zenati, Houssam, Arbel, Julyan
سنة النشر: 2024
المجموعة: Computer Science
Mathematics
Statistics
مصطلحات موضوعية: Computer Science - Machine Learning, Mathematics - Statistics Theory, Statistics - Machine Learning
الوصف: We address the problem of stochastic combinatorial semi-bandits, where a player selects among $P$ actions from the power set of a set containing $d$ base items. Adaptivity to the problem's structure is essential in order to obtain optimal regret upper bounds. As estimating the coefficients of a covariance matrix can be manageable in practice, leveraging them should improve the regret. We design ``optimistic'' covariance-adaptive algorithms relying on online estimations of the covariance structure, called OLSUCBC and COSV (only the variances for the latter). They both yields improved gap-free regret. Although COSV can be slightly suboptimal, it improves on computational complexity by taking inspiration from Thompson Sampling approaches. It is the first sampling-based algorithm satisfying a $\sqrt{T}$ gap-free regret (up to poly-logs). We also show that in some cases, our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches, not only in exponential regimes where $P\gg d$ but also when $P\leq d$, which is not covered by existing analyses.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.15171
رقم الأكسشن: edsarx.2402.15171
قاعدة البيانات: arXiv