F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha; 24(178):1−75, 2023.

Abstract

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes impractical in complex applications due to lack of interaction between agents, the curse of dimensionality, and computational complexity. As a result, decentralized MARL algorithms have been developed. However, existing decentralized methods only handle fully cooperative settings where a large amount of information needs to be transmitted during training. While the block coordinate gradient descent scheme used in these methods simplifies calculations, it introduces significant bias. This paper presents a flexible fully decentralized actor-critic MARL framework that can incorporate most actor-critic methods and handle large-scale general cooperative multi-agent scenarios. The framework employs a primal-dual hybrid gradient descent algorithm to enable decentralized learning of individual agents. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, ensuring stability in multi-agent policy learning. Additionally, the proposed framework achieves scalability and stability in large-scale environments. Information transmission is reduced through parameter sharing and novel modeling-other-agents approaches based on theory-of-mind and online supervised learning. Extensive experiments conducted in Cooperative Multi-agent Particle Environment and StarCraft II demonstrate that the proposed decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.

[abs]

[pdf][bib]