Deep learning of intrinsically motivated options in the arcade learning environment

Bagot, Louis; Mets, Kevin; De Schepper, Tom; Latré, Steven

Title

Author

Bagot, Louis

Mets, Kevin

De Schepper, Tom

Latré, Steven

Abstract

In Reinforcement Learning, Intrinsic Motivation motivates directed behaviors through a wide range of reward-generating methods. Depending on the task and environment, these rewards can be useful, might complement each other, but can also break down entirely, as seen with the noisy TV problem for curiosity. We therefore argue that scalability and robustness, among others, are key desirable properties of a method to incorporate intrinsic rewards, which a simple weighted sum of reward lacks. In a tabular setting, Explore Options let the agent call an intrinsically motivated policy in order to learn from its trajectories. We introduce Deep Explore Options, revising Explore Options within the Deep Reinforcement Learning paradigm to tackle complex visual problems. Deep Explore Options can naturally learn from several unrelated intrinsic rewards, ignore harmful intrinsic rewards, learn to balance exploration, but also isolate exploitative and exploratory behaviors for independent usage. We test Deep Explore Options on hard and easy exploration games of the Atari Suite, following a benchmarking study to ensure fairness. Our empirical results show that they achieve similar results than weighted sum baselines, while maintaining their key properties.

Language

English

Source (book)

Deep Reinforcement Learning Workshop, NeurIPS 2022, 9 December, 2022

Publication

2022

Volume/pages

p. 1-14

Full text (open access)

https://repository.uantwerpen.be/docstore/d:irua:20858

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science Faculty of Applied Engineering Sciences

Research group				Internet Data Lab (IDLab)

Publication type				P3 Proceeding

Subject				Engineering sciences. Technology Computer. Automation

Affiliation				Publications with a UAntwerp address

Source file

https://openreview.net/forum?id=fFKehNqPxk

Identifier

c:irua:201562

Creation

14.12.2023

Last edited

17.06.2024

To cite this reference

https://hdl.handle.net/10067/2015620151162165141