Adaptation to the Range in K-Armed Bandits

Authors: Hédi Hadiji, Gilles Stoltz; Publication Date: 2023; Pages: 1-33

Abstract

This study focuses on stochastic bandit problems involving K arms, each associated with a distribution supported on a given finite range [m, M]. The range [m, M] is not assumed to be known, and the study demonstrates that there is a cost associated with learning this range. This introduces a new trade-off between distribution-dependent and distribution-free regret bounds, making it impossible to simultaneously achieve the typical ln T and sqrt(T) bounds. In order to achieve a sqrt(T) distribution-free regret bound, the distribution-dependent regret bounds must be at least of order sqrt(T). The study presents a strategy that achieves the regret rates imposed by this new trade-off.

[abs]

[pdf][bib]