Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?

Authors: Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan; Published: 24(35):1−52, 2023.

Abstract

This study focuses on multi-player general-sum Markov games with a designated leader and other players as followers. The followers in these games are myopically rational, meaning they aim to maximize their instantaneous rewards. The goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(\pi^*, \nu^*)$ satisfying the following conditions: (i) $\pi^*$ is the optimal policy for the leader when the followers always play their best response, and (ii) $\nu^*$ is the best response policy of the followers, which is a Nash equilibrium of the followers’ game induced by $\pi^*$. Sample-efficient reinforcement learning (RL) algorithms are developed to solve for an SNE in both online and offline settings. These algorithms are optimistic and pessimistic variants of least-squares value iteration and can incorporate function approximation tools for large state spaces. Additionally, for the case with linear function approximation, it is proven that the algorithms achieve sublinear regret and suboptimality under online and offline setups respectively. It is worth noting that these are the first provably efficient RL algorithms for solving SNEs in general-sum Markov games with myopically rational followers.

[abs]

[pdf][bib]