Model-Based Multi-Agent Reinforcement Learning in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang; 24(175):1−53, 2023.
Abstract
Model-based reinforcement learning (RL) is a fundamental approach in RL that involves establishing an empirical model to find an optimal policy. It is particularly well-suited for multi-agent RL (MARL) as it allows for the separation of learning and planning phases, avoiding the non-stationarity problem that arises when all agents simultaneously improve their policies. However, the sample complexity of model-based MARL algorithms has not been thoroughly examined. In this paper, we aim to address this fundamental question regarding sample complexity. We focus on the basic setting of two-player discounted zero-sum Markov games, where only a generative model is available. We demonstrate that model-based MARL achieves a sample complexity of Oe(|S||A||B|(1 − γ)−3ε−2) for finding the Nash equilibrium (NE) value with an ε error margin, as well as the ε-NE policies with a smooth planning oracle. Here, γ represents the discount factor, and S, A, and B denote the state space and action spaces for the two agents. Additionally, we establish that this sample bound is minimax-optimal, up to logarithmic factors, for reward-agnostic algorithms that query state transition samples without reward knowledge, by providing a matching lower bound. This is in contrast to the reward-aware setting, where the sample complexity lower bound is Ωe(|S|(|A| + |B|)(1 − γ)−3ε−2). Our results not only highlight the sample efficiency of this basic model-based MARL approach, but also shed light on the fundamental tradeoff between its power (handling the reward-agnostic case easily) and its limitations (less adaptability and suboptimality in |A| and |B|), which is particularly relevant in the multi-agent context.
[abs]