Publication
Title
Distributed critics using counterfactual value decomposition in multi-agent reinforcement learning
Author
Abstract
In cooperative multi-agent reinforcement learning, the credit assignment limits the ability of the agents to learn a policy. Many state-of-the-art methods use a centralised critic to overcome this credit assignment problem. However, the disadvantage of using a centralised critic is that this limits the scalability of the multi-agent systems following the centralised training and decentralised execution paradigm. The state-of-the-art has attempted to overcome this limitation by using factorisation methods. Unfortunately, these factorisation methods are not usable in every jointly observable environment. This paper presents the Counterfactual Value Decomposition Critics (CVDC) method that follows the decentralised training with free critic communication and a decentralised execution paradigm. The CVDC method uses the insight that any Q-function is decomposable into a set of agent-specific Q-functions. This property is combined with counterfactual reasoning to create a set of decomposed communicating critics, which is usable within every jointly observable environment. The agent-specific critic is then used to train the local policy of an agent without the need for any centralised training structure. We evaluate and compare the CVDC method with other state-of-the-art baselines in a set of environments from the Multi Particle Environments. The results show that our method outperforms the baseline algorithms in training time and obtained return even when parameter sharing is disabled.
Language
English
Source (book)
Adaptive and Learning Agents Workshop (ALA), collocated with AAMAS, 29-30 May, 2023, London, UK
Publication
2023
Volume/pages
p. 1-9
Full text (open access)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Source file
Record
Identifier
Creation 14.12.2023
Last edited 17.06.2024
To cite this reference