Steady-State Policy Synthesis in Multichain Markov Decision Processes

التفاصيل البيبلوغرافية
العنوان: Steady-State Policy Synthesis in Multichain Markov Decision Processes
المؤلفون: Andre Beckus, Ismail Alkhouri, George K. Atia, Alvaro Velasquez
المصدر: IJCAI
بيانات النشر: International Joint Conferences on Artificial Intelligence Organization, 2020.
سنة النشر: 2020
مصطلحات موضوعية: Steady state (electronics), Computer science, Statistical physics, Markov decision process
الوصف: The formal synthesis of automated or autonomous agents has elicited strong interest from the artificial intelligence community in recent years. This problem space broadly entails the derivation of decision-making policies for agents acting in an environment such that a formal specification of behavior is satisfied. Popular formalisms for such specifications include the quintessential Linear Temporal Logic (LTL) and Computation Tree Logic (CTL) which reason over infinite sequences and trees, respectively, of states. However, the related and relevant problem of reasoning over the frequency with which states are visited infinitely and enforcing behavioral specifications on the same has received little attention. That problem, known as Steady-State Policy Synthesis (SSPS) or steady-state control, is the focus of this paper. Prior related work has been mostly confined to unichain Markov Decision Processes (MDPs), while a tractable solution to the general multichain setting heretofore remains elusive. In this paper, we provide a solution to the latter within the context of multichain MDPs over a class of policies that account for all possible transitions in the given MDP. The solution policy is derived from a novel linear program (LP) that encodes constraints on the limiting distributions of the Markov chain induced by said policy. We establish a one-to-one correspondence between the feasible solutions of the LP and the stationary distributions of the induced Markov chains. The derived policy is shown to maximize the reward among the constrained class of stationary policies and to satisfy the specification constraints even when it does not exercise all possible transitions.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::f95b270df68f0bcdbb40f1e3e592f5da
https://doi.org/10.24963/ijcai.2020/563
حقوق: OPEN
رقم الأكسشن: edsair.doi...........f95b270df68f0bcdbb40f1e3e592f5da
قاعدة البيانات: OpenAIRE