Adaptation Augmented Model-based Policy Optimization

Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang; 24(218):1−35, 2023.

Abstract

Model-based reinforcement learning (RL) is often more sample efficient compared to model-free RL as it utilizes a learned dynamics model to assist in decision making. However, the learned model is typically not completely accurate, and the errors can accumulate in multi-step predictions, leading to suboptimal asymptotic performance. This paper presents an upper bound of the return discrepancy between the real dynamics and the learned model, highlighting the issue of distribution shift between simulated and real data. Based on this analysis, the authors propose an adaptation augmented model-based policy optimization (AMPO) framework to address the distribution shift problem through feature learning and instance re-weighting. The feature-based variant, FAMPO, incorporates unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. On the other hand, the instance-based variant, IAMPO, employs importance sampling to re-weight the real samples used for model training. In addition to model learning, the authors also explore how to enhance policy optimization during the model usage phase by selecting simulated samples with varying probabilities based on their uncertainty. Extensive experiments on challenging continuous control tasks demonstrate that FAMPO and IAMPO, along with the proposed model usage technique, achieve superior performance compared to baseline methods, showcasing the effectiveness of the proposed approaches.

[abs]

[pdf][bib]