Title
|
|
|
|
Distributed critics using counterfactual value decomposition in multi-agent reinforcement learning
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
In cooperative multi-agent reinforcement learning, the credit assignment limits the ability of the agents to learn a policy. Many state-of-the-art methods use a centralised critic to overcome this credit assignment problem. However, the disadvantage of using a centralised critic is that this limits the scalability of the multi-agent systems following the centralised training and decentralised execution paradigm. The state-of-the-art has attempted to overcome this limitation by using factorisation methods. Unfortunately, these factorisation methods are not usable in every jointly observable environment. This paper presents the Counterfactual Value Decomposition Critics (CVDC) method that follows the decentralised training with free critic communication and a decentralised execution paradigm. The CVDC method uses the insight that any Q-function is decomposable into a set of agent-specific Q-functions. This property is combined with counterfactual reasoning to create a set of decomposed communicating critics, which is usable within every jointly observable environment. The agent-specific critic is then used to train the local policy of an agent without the need for any centralised training structure. We evaluate and compare the CVDC method with other state-of-the-art baselines in a set of environments from the Multi Particle Environments. The results show that our method outperforms the baseline algorithms in training time and obtained return even when parameter sharing is disabled. |
| |
Language
|
|
|
|
English
| |
Source (book)
|
|
|
|
Adaptive and Learning Agents Workshop (ALA), collocated with AAMAS, 29-30 May, 2023, London, UK
| |
Publication
|
|
|
|
2023
| |
Volume/pages
|
|
|
|
p. 1-9
| |
Full text (open access)
|
|
|
|
| |
|