HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation
Weijie J. Su, Yuancheng Zhu; 24(124):1−53, 2023.
Abstract
Stochastic gradient descent (SGD) is a widely used method for online learning in cases where data is received in a continuous stream or the data size is very large. However, despite the extensive research on SGD, there is limited knowledge about the statistical inferential properties of SGD-based predictions. This paper presents a new technique called HiGrad, which allows for statistical inference in online learning without incurring additional computational costs compared to SGD. The HiGrad procedure involves initially performing SGD updates for a certain period and then dividing the single thread into multiple threads, with each thread operating hierarchically in the same manner. By utilizing the predictions from multiple threads, a confidence interval based on the t-distribution is constructed by decorrelating the predictions using covariance structures derived from an extension of the Ruppert-Polyak averaging scheme known as Donsker-style. This extension is a technical contribution of independent interest. Under certain regularity conditions, the HiGrad confidence interval is proven to achieve asymptotically exact coverage probability. The performance of HiGrad is evaluated through extensive simulation studies and a real data example. Additionally, an R package called “higrad” has been developed to implement the HiGrad method.
[abs]