Strategic Knowledge Transfer
Max Olan Smith, Thomas Anthony, Michael P. Wellman; 24(233):1−96, 2023.
Abstract
When playing or solving a game, it is common to encounter a series of changing strategies used by other agents. These strategies often have overlapping elements, with a set of possible policies that may be played and sampled at the beginning of the game using different distributions. As an agent faces these changing strategies, it has the opportunity to transfer its learned play against previously encountered opponent policies. We address two problems: (1) how to transfer learned responses across changing opponent strategies, and (2) how this transfer can reduce the cumulative cost of learning in game solving. The first problem is referred to as the strategic knowledge transfer problem. For value-based response policies, we show that Q-Mixing can approximately solve this problem by appropriately averaging the component Q-values. Solutions to the first problem can be applied to reduce the computational cost of learning-based game solving algorithms. We propose two algorithms that operate within the Policy-Space Response Oracles (PSRO) framework. Mixed-Oracles reduces the per-policy construction cost by transferring responses from previously encountered opponents, while Mixed-Opponents combines the previously encountered opponents into a single novel policy to perform strategic knowledge transfer. Experimental evaluation of these methods on general-sum grid-world games provides evidence of their advantages and limitations compared to standard PSRO.
[abs]