Improving the credit assignment problem in multi-agent systems
Abstract in English Multi-agent systems (MASs), as one of the important symbols of Distributed Artificial Intelligence (DAI), have demonstrated their efficiency in modeling and implementing distributed systems. MASs have various applications in software development, modeling complex systems, intelligent traffic control, cyber-physical systems, etc. One of the topics discussed in relation to MASs is their learning problem, which often occurs in the form of Reinforcement Learning (RL). Since these systems are often placed in unknown environments, it seems that RL is a suitable method for their learning. In Multi-Agent Reinforcement Learning (MARL), agents interact partially with the envi- ronment, resulting in rewards or punishments. The environment returns the cumulative value of these rewards/punishments in the form of a global reward to the environment, which is received by an entity called the critic. Now the critic is faced with the question of how to distribute this global reward, which the environment has provided in a unit value, among the agents. This problem is known as the Multi-agent Credit Assignment (MCA) problem in the relevant literature. Three approaches can be considered to solve this problem. The first approach, called the equal approach, seeks equal distribution of rewards among agents. The second approach, called the fair approach, aims for a distribution of global reward among agents based on their contribution to MAS. The third approach, which has received less attention, aims to enhance the performance of MAS and is referred to as the performance increasing approach. In this research, we focus on the third approach. In this study, we introduce a constraint called the Task Start Threshold (TST). The existence of this constraint means that an agent will only start working if it receives an appropriate reward otherwise, the agent will not start working. Considering this constraint and the fact that the sum of the start thresholds of agents is less than the received global reward, the MCA can be considered a bankruptcy problem. Bankruptcy, which is a subset of Game Theory (GT), deals with how the assets of a debtor should be distributed among creditors, whose total claims exceed the debtor’s assets. In addition, we consider a multi-score environment as an operational environment. A multi-score environment is an environment in which solving each part of it has a different score compared to other parts. In this research, we consider two main solutions and three auxiliary solutions based on the concepts of bankruptcy and evolutionary games to solve the reward assignment problem. In the first solution, priority is given to agents with higher knowledge and higher start thresholds, so they need to receive higher rewards to start working. These agents, with higher knowledge, contribute to gaining more points for the MAS, which increases the performance of the MAS. In the second solution, which is an improvement over the first solution, a three-stage solution is used by combining the bankruptcy game and Evolutionary Game (EG). The proposed methods were evaluated in an operational environment which is called Multi-score Puzzle (MsP) as compared to the state-of-the- art algorithms such as COMA, VDN, SQDDPG, ranking, dynamic and history-based methods. Simulation results indicated the better performance of the proposed methods in terms of the group learning rate, confidence, expertness, certainty, and correctness. The effect was the only criterion in which the proposed methods underperformed other methods. Furthermore, this study is expanded by exploring the application of MARL and MCA in Cyber-Physical Systems (CPS) and Internet of Things (IoT). We delve into challenges within CPS and IoT that can be effectively addressed through MARL and MCA tech- niques. The focus narrows down to resource allocation and multi-objective problems. Noteworthy challenges within resource allocation, such as job scheduling, are examined in detail. Additionally, we present case studies involving smart homes and the optimiza- tion of paths in an unknown maze environment, demonstrating the efficacy of MARL and MCA in these contexts. In summary, this research contributes to the evolution of Multi-Agent Reinforcement Learning by proposing innovative solutions to the Multi-agent Credit Assignment prob- lem, enhanced through insights from bankruptcy and evolutionary game concepts. Fur- thermore, our exploration of CPS and IoT applications underlines the broad practical relevance of these techniques, substantiated by case studies showcasing their benefits in real-world scenarios.
Tehran : IAU University Science and Research Branch & University of Antwerp , 2023
xx, 106 p.
Supervisor: Shiri, M.E. [Supervisor]
Supervisor: Challenger, M. [Supervisor]
Full text (publisher's version - intranet only)
Research group
Publication type
Publications with a UAntwerp address
External links
Creation 11.12.2023
Last edited 12.12.2023
To cite this reference