Publication
Title
The impatient may use limited optimism to minimize regret
Author
Abstract
Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may realize that, with hindsight, she could have increased her reward by playing differently: this difference in outcomes constitutes her regret value. The agent may thus elect to follow a regret- minimal strategy. In this paper, it is shown that (1) there always exist regret-minimal strategies that are admissible—a strategy being inadmissible if there is another strategy that always performs better; (2) computing the minimum possible regret or checking that a strategy is regret-minimal can be done in Open image in new window , disregarding the computational cost of numerical analysis (otherwise, this bound becomes Open image in new window ).
Language
English
Source (journal)
Lecture notes in computer science. - Berlin, 1973, currens
Source (book)
Foundations of Software Science and Computation Structures : 22nd International Conference, FOSSACS 2019, April 6–11, 2019, Prague, Czech Republic / Bojańczyk, Mikołaj [edit.]; et al.
Publication
Cham : Springer , 2019
ISSN
0302-9743 [print]
1611-3349 [online]
ISBN
978-3-030-17126-1
DOI
10.1007/978-3-030-17127-8_8
Volume/pages
11425 (2019) , p. 133-149
ISI
000714952800008
Full text (Publisher's DOI)
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 26.10.2019
Last edited 23.10.2024
To cite this reference