Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

Authors: Ali Kara, Naci Saldi, Serdar Yüksel; Volume 24, Issue 199, Pages 1-34, 2023.

Abstract

This paper explores the applicability of reinforcement learning algorithms for continuous state and action spaces in Markov decision processes (MDPs), which are also known as controlled Markov chains. It demonstrates that Q-learning for standard Borel MDPs, through the process of quantization of states and actions (referred to as Quantized Q-Learning), can converge to a limit under mild regularity conditions. The obtained limit satisfies an optimality equation, leading to near optimality with either explicit performance bounds or asymptotic optimality guarantees. The approach employed in this paper involves treating quantization as a measurement kernel, transforming the quantized MDP into a partially observed Markov decision process (POMDP). By utilizing near optimality and convergence results of Q-learning for POMDPs, and leveraging the near-optimality of finite state model approximations for MDPs with weakly continuous kernels, the paper establishes a general convergence and approximation result for the applicability of Q-learning for continuous MDPs.

[abs]

[pdf][bib]