Bayesian Exploration Networks: A Study in Machine Learning (arXiv:2308.13049v1 [cs.LG])

The paper discusses Bayesian reinforcement learning (RL) as an effective approach for making sequential decisions under uncertainty. Unlike frequentist methods, Bayesian agents do not face the exploration/exploitation dilemma. However, the computational complexity of learning Bayes-optimal policies poses a challenge, limiting its applicability to toy domains. To address this challenge, the authors propose a novel model-free approach that models uncertainty in a one-dimensional Bellman operator instead of high-dimensional state transition distributions. The analysis reveals that existing model-free approaches either do not propagate epistemic uncertainty or optimize over a set of contextual policies, leading to arbitrarily Bayes-suboptimal policies. To overcome these issues, the authors introduce the Bayesian exploration network (BEN), which uses normalizing flows to model both aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. While complete optimization leads to true Bayes-optimal policies, the approach remains tractable through partial optimization, similar to variational expectation-maximization. Empirical results demonstrate that BEN can successfully learn true Bayes-optimal policies in tasks where existing model-free approaches fail.