Title
|
|
|
|
Learning to communicate using counterfactual reasoning
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the problem of encouraging the agents to be influenced by incoming messages are major challenges within this research field which need to be overcome in order to learn a valid communication protocol. This paper introduces the novel multi-agent counterfactual communication learning (MACC) method which adapts counterfactual reasoning in order to overcome the credit assignment problem for communicating agents. Next, the non-stationarity of the communication environment, while learning the communication Q-function, is overcome by creating the communication Q-function using the action policy of the other agents and the Q-function of the action environment. As the exact method to create the communication Q-function can be computationally intensive for a large number of agents, two approximation methods are proposed. Additionally, a social loss function is introduced in order to create influenceable agents, which is required to learn a valid communication protocol. Our experiments show that MACC is able to outperform the state-of-the-art baselines in four different scenarios in the Particle environment. Finally, we demonstrate the scalability of MACC in a matrix environment. |
| |
Language
|
|
|
|
English
| |
Source (book)
|
|
|
|
Adaptive and Learning Agents Workshop (ALA), collocated with AAMAS, 11-13 May, 2022, Auckland, New Zealand
| |
Publication
|
|
|
|
2022
| |
Volume/pages
|
|
|
|
p. 1-9
| |
Medium
|
|
|
|
E-only publicatie
| |
Full text (open access)
|
|
|
|
| |
|