[Submitted on 29 Aug 2023]
Click here to download a PDF of the paper titled “Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning” by Md Masudur Rahman and Yexiang Xue: Download PDF
Abstract: This paper presents an algorithm that aims to improve the generalization ability of reinforcement learning agents by reducing overfitting to confounding features. The algorithm involves a max-min game theoretic objective, where a generator is used to transfer the style of observation during reinforcement learning. The generator also perturbs the observation to maximize the agent’s probability of taking a different action. On the other hand, a policy network updates its parameters to minimize the effect of these perturbations, thus ensuring robustness while maximizing the expected future reward. This approach is implemented in a practical deep reinforcement learning algorithm called Adversarial Robust Policy Optimization (ARPO), which aims to find a robust policy that can generalize to unseen environments. The performance of ARPO is evaluated on Procgen and Distracting Control Suite, and it shows improved performance compared to several baseline algorithms, including data augmentation.
Submission history
From: Md Masudur Rahman [view email]
[v1]
Tue, 29 Aug 2023 18:17:35 UTC (24,919 KB)