Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection
Wenhao Li, Ningyuan Chen, L. Jeff Hong; 24(136):1−84, 2023.
Abstract
This study addresses the problem of dimension reduction in contextual online learning (multi-armed bandit) with high-dimensional covariates $x$ and decision $y$. The reward function to be learned, $f(x,y)$, does not have a specific parametric form. Previous research has indicated that the optimal regret is $\\tilde{O}(T^{(d_x\\!+\\!d_y\\!+\\!1)/(d_x\\!+\\!d_y\\!+\\!2)})$, where $d_x$ and $d_y$ represent the dimensions of $x$ and $y$, respectively, resulting in the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is known as sparsity in statistics. To exploit the sparsity structure of the covariate, the authors propose a variable selection algorithm called BV-LASSO. This algorithm incorporates novel techniques such as binning and voting to apply LASSO to nonparametric settings. By using BV-LASSO as a subroutine, the authors achieve a regret of $\\tilde{O}(T^{(d_x^*\\!+\\!d_y\\!+\\!1)/(d_x^*\\!+\\!d_y\\!+\\!2)})$, where $d_x^*$ represents the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional, indicating that no further improvement is possible. The proposed algorithm can be considered a general approach for achieving dimension reduction through variable selection in nonparametric settings.
[abstract]