Adversarial Bandits with Knapsacks

التفاصيل البيبلوغرافية
العنوان: Adversarial Bandits with Knapsacks
المؤلفون: Nicole Immorlica, Robert E. Schapire, Aleksandrs Slivkins, Karthik Abinav Sankararaman
المصدر: FOCS
بيانات النشر: Association for Computing Machinery (ACM), 2022.
سنة النشر: 2022
مصطلحات موضوعية: FOS: Computer and information sciences, Mathematical optimization, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), Time horizon, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Upper and lower bounds, Machine Learning (cs.LG), Scheduling (computing), Statistics - Machine Learning, Artificial Intelligence, Computer Science - Data Structures and Algorithms, 0202 electrical engineering, electronic engineering, information engineering, Common value auction, Data Structures and Algorithms (cs.DS), 0105 earth and related environmental sciences, Competitive analysis, 16. Peace & justice, Knapsack problem, Hardware and Architecture, Control and Systems Engineering, Dynamic pricing, Repeated game, 020201 artificial intelligence & image processing, Software, Information Systems
الوصف: We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the "classic" adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to the algorithm's reward. We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version and use it as a subroutine to solve the latter.
The extended abstract appeared in FOCS 2019. The definitive version was published in JACM '22. V8 is the latest version with all technical changes. Subsequent versions fixes minor LATEX presentation issues
تدمد: 1557-735X
0004-5411
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dc14322f3986bb8cbd19c1836dc7d0bb
https://doi.org/10.1145/3557045
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....dc14322f3986bb8cbd19c1836dc7d0bb
قاعدة البيانات: OpenAIRE