GPI-tree search : algorithms for decision-time planning with the general policy improvement theorem

Bagot, Louis; D'Eer, Lynn; Latré, Steven; De Schepper, Tom; Mets, Kevin

Title

Author

Bagot, Louis

D'Eer, Lynn

Latré, Steven

De Schepper, Tom

Mets, Kevin

Abstract

In Reinforcement Learning, Unsupervised Skill Discovery tackles the learning of several policies for downstream task transfer. Once these skills are learnt, the question of how best to use and combine them remains an open problem. The General Policy Improvement Theorem (GPI) creates a policy stronger than any individual skill by selecting the highest-valued policy at each timestep. However, the GPI policy is unable to mix and combine the skills at decision time to formulate stronger plans. In this paper, we propose to adopt a model-based setting in order to make such planning possible, and formally show that a forward search improves on the GPI policy and any shallower searches under some approximation term. We argue for decision-time planning, and design a family of algorithms, GPI-Tree Search Algorithms, to use Monte Carlo Tree Search (MCTS) with GPI. These algorithms foster the skills and𝑄-value priors of the GPI framework to guide and improve the search. Our quantitative experiments show that the resulting policies are much stronger than the GPI policy alone, while our qualitative results provide a good intuitive understanding of how each method works and of the possible design choices that can be made.

Language

English

Source (book)

Adaptive and Learning Agents Workshop (ALA), collocated with AAMAS, 29-30 May, 2023, London, UK

Publication

2023

Volume/pages

p. 1-8

Full text (open access)

https://repository.uantwerpen.be/docstore/d:irua:20876

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science Faculty of Applied Engineering Sciences

Research group				Internet Data Lab (IDLab)
Project info				Research Program Artificial Intelligence
Publication type				P3 Proceeding

Subject				Engineering sciences. Technology Computer. Automation

Affiliation				Publications with a UAntwerp address

Source file

https://alaworkshop2023.github.io/papers/ALA2023_paper_24.pdf

Identifier

c:irua:201597

Creation

14.12.2023

Last edited

12.08.2024

To cite this reference

https://hdl.handle.net/10067/2015970151162165141