Convex Reinforcement Learning in Finite Trials
Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli; 24(250):1−42, 2023.
Abstract
Convex Reinforcement Learning (RL) is a framework that extends the standard RL objective to include any convex (or concave) function of the state distribution induced by the agent’s policy. This framework encompasses several practical applications, including pure exploration, imitation learning, and risk-averse RL. However, existing research on convex RL primarily focuses on evaluating performance over infinite realizations or trials, whereas many applications require excellent performance over a limited number of trials. To address this practical need, we propose a formulation of convex RL in finite trials. In this setting, the objective is defined as any convex function of the empirical state distribution computed over a finite number of realizations. In this paper, we provide a comprehensive theoretical study of this setting, including an analysis of the importance of non-Markovian policies for achieving optimality, as well as a characterization of the computational and statistical complexity of the problem in different scenarios.
[abs]