The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time

Authors: Raj Agrawal, Tamara Broderick; Published in 2023, 24(27):1−60.

Abstract

Identifying a small set of covariates associated with a target response and estimating their effects is a common requirement in scientific problems. However, traditional linear and additive methods often fail to capture the nonlinearity and interactions present, resulting in poor estimation and variable selection. Unfortunately, methods that can handle sparsity, nonlinearity, and interactions simultaneously are computationally challenging, with runtime increasing quadratically or worse with the number of covariates. This paper addresses this computational bottleneck by introducing a solution. The authors demonstrate that suitable interaction models can be represented using a “kernel trick,” enabling variable selection and estimation in linear time, i.e., O(# covariates). The resulting fit corresponds to a sparse orthogonal decomposition of the regression function in a Hilbert space, where interaction effects capture all the variation that cannot be explained by lower-order effects. The proposed approach outperforms existing methods on various synthetic and real datasets for large, high-dimensional data sets, while also being competitive or significantly faster in terms of runtime.

[abs]

[pdf][bib]