Distributed critics using counterfactual value decomposition in multi-agent reinforcement learning

Vanneste, Simon; Vanneste, Astrid; De Schepper, Tom; Mercelis, Siegfried; Hellinckx, Peter; Mets, Kevin

Title

Author

Vanneste, Simon

Vanneste, Astrid

De Schepper, Tom

Mercelis, Siegfried

Hellinckx, Peter

Mets, Kevin

Abstract

In cooperative multi-agent reinforcement learning, the credit assignment limits the ability of the agents to learn a policy. Many state-of-the-art methods use a centralised critic to overcome this credit assignment problem. However, the disadvantage of using a centralised critic is that this limits the scalability of the multi-agent systems following the centralised training and decentralised execution paradigm. The state-of-the-art has attempted to overcome this limitation by using factorisation methods. Unfortunately, these factorisation methods are not usable in every jointly observable environment. This paper presents the Counterfactual Value Decomposition Critics (CVDC) method that follows the decentralised training with free critic communication and a decentralised execution paradigm. The CVDC method uses the insight that any Q-function is decomposable into a set of agent-specific Q-functions. This property is combined with counterfactual reasoning to create a set of decomposed communicating critics, which is usable within every jointly observable environment. The agent-specific critic is then used to train the local policy of an agent without the need for any centralised training structure. We evaluate and compare the CVDC method with other state-of-the-art baselines in a set of environments from the Multi Particle Environments. The results show that our method outperforms the baseline algorithms in training time and obtained return even when parameter sharing is disabled.

Language

English

Source (book)

Adaptive and Learning Agents Workshop (ALA), collocated with AAMAS, 29-30 May, 2023, London, UK

Publication

2023

Volume/pages

p. 1-9

Full text (open access)

https://repository.uantwerpen.be/docstore/d:irua:20874

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science Faculty of Applied Engineering Sciences

Research group				Internet Data Lab (IDLab) Modelling for Sustainability (M4S)

Publication type				P3 Proceeding

Subject				Engineering sciences. Technology Computer. Automation

Affiliation				Publications with a UAntwerp address

Source file

https://alaworkshop2023.github.io/papers/ALA2023_paper_21.pdf

Identifier

Creation

14.12.2023

Last edited

17.06.2024

To cite this reference

https://hdl.handle.net/10067/2015950151162165141