Learning Good State and Action Representations for Markov Decision Process via Tensor Decomposition

Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Mengdi Wang, Anru R. Zhang; 24(115):1−53, 2023.

Abstract

This paper presents a novel unsupervised learning approach that utilizes tensor decomposition to extract meaningful low-dimensional representations of states and actions in a continuous-state-action Markov decision process (MDP). By exploiting the tensor structure of the MDP through kernelization, importance sampling, and low-Tucker-rank approximation, the method is able to identify informative representations from empirical trajectories. Additionally, the method can be applied to cluster states and actions, leading to the discovery of the optimal discrete MDP abstraction. The paper provides rigorous statistical error bounds for tensor concentration and diffusion distance preservation after embedding. Furthermore, it is proven that the learned state/action abstractions accurately approximate latent block structures, making them useful for function approximation in downstream tasks like policy evaluation.

[abs]


[pdf]
[bib]