Single Timescale Actor-Critic Method for Solving the Linear Quadratic Regulator with Convergence Guarantees
Mo Zhou, Jianfeng Lu; 24(222):1−34, 2023.
Abstract
We propose a single timescale actor-critic algorithm for solving the linear quadratic regulator (LQR) problem. The critic utilizes a least squares temporal difference (LSTD) method, while the actor employs a natural policy gradient method. We provide a proof of convergence with a sample complexity of $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The approach used in the proof is applicable to general single timescale bilevel optimization problems. Additionally, we validate our theoretical results on convergence through numerical experiments.
[abs]