Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

التفاصيل البيبلوغرافية
العنوان: Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality
المؤلفون: Grand-Clement, Julien, Petrik, Marek, Vieille, Nicolas
سنة النشر: 2023
المجموعة: Computer Science
Mathematics
مصطلحات موضوعية: Mathematics - Optimization and Control, Computer Science - Computer Science and Game Theory
الوصف: Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that {\em approximate} Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2312.03618
رقم الأكسشن: edsarx.2312.03618
قاعدة البيانات: arXiv