Publication
Title
Finite-memory near-optimal learning for Markov decision processes with long-run average reward
Author
Abstract
We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.
Language
English
Source (journal)
Journal of machine learning research. - Cambridge, Mass.
Source (book)
Conference on Uncertainty in Artificial Intelligence, UAI
Publication
Cambridge, Mass. : Microtome Publishing , 2020
ISSN
1532-4435
Volume/pages
124 (2020) , p. 1149-1158
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Project info
SAILor: Safe Artificial Intelligence and Learning for Verification.
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Record
Identifier
Creation 30.09.2020
Last edited 17.06.2024
To cite this reference