Multi Agent Reinforcement Learning

Background

In cooperative MARL, one central challenge is
coping with the size of the joint action space, which grows exponentially in the number of agents

(Game theory suffers this problem)

Efficient MARL therefore must be able to generalize over large joint action spaces, in the same way taht CNN allows deep RL to generalize over large visual state spaces.

One solution:

decentralize the decision policy and / or vlaue function

joint value function can be factorized into utility functions that each depend only on the actions of one agent

Q(a1,a2,a3|s)=f1(a1|s)+f2(a2|s)+f3(a3|s)

f - payoff function can be approximated

the joint value function can be efficiently maximized if each agent simply selects the action that maximizes its corresponding utility function

Problem - relative overgeneralization:

During exploration other agents act randomly and punishment caused by uncooperative agents may outweigh rewards that would be achievable with coordinated actions

Solution:

higher-order factorization, such as coordination graph

Q(a1,a2,a3|s)=f12(a1,a2|s)+f23(a2,a3|s)

Although the value can no longer be maximized by each agent individually, the greedy action can be found using message passing along the edges also known as belief propagation

State-of-the-art value factorization approaches, i.e. VDN, QMIX condition an agent’s utility on its history, that is its past observations and actions, and share the parameters of all utility functions.

One agent experience is used to train all.

MARL Q-learning algorithms:

Refs

Reviews

[1]OroojlooyJadid, A. and Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. CoRR, abs/1908.03963, 2019.

Coordination Graph

[2]C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14. The MIT Press, 2002. [3]C. Guestrin, M. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” in Proceedings of The Nineteenth International Conference on Machine Learning, vol. 2, 2002, pp. 227–234.

CG equipped MARL

Sparse cooperative Q-learning
[4]Kok, J. R. and Vlassis, N. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7(Sep):1789–1828, 2006.

Jiexin Wang

Multi Agent Reinforcement Learning

Background

Refs

You May Also Enjoy

Generative Adversarial Nets

Action Selection in RL

Latex test

Model-based RL