Non-stationary Online Learning with Memory and Non-stochastic Control

Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou; 24(206):1−70, 2023.

Abstract

This paper investigates the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions and captures temporal effects of learning problems. The authors propose a novel algorithm for OCO with memory that achieves optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length. The key challenge is controlling the switching cost, which is addressed by a switching-cost-aware online ensemble approach equipped with a new meta-base decomposition of dynamic policy regret and a carefully designed meta-learner and base-learner that explicitly regularizes the switching cost. The results are also applied to address non-stationarity in online non-stochastic control, specifically controlling a linear dynamical system with adversarial disturbance and convex cost functions. A novel gradient-based controller with dynamic policy regret guarantees is derived, which is the first controller proven to be competitive with a sequence of changing policies for online non-stochastic control.

[abs]

[pdf][bib]