Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size
Xiaoyu Wang, Ya-xiang Yuan; 24(48):1−49, 2023.
Abstract
This paper introduces a novel step-size framework for the stochastic gradient descent (SGD) method, called bandwidth-based step sizes. These step sizes are allowed to vary within a banded region, providing efficient and flexible step size selection in optimization. The framework includes cyclical and non-monotonic step sizes, such as the triangular policy and cosine with restart, for which theoretical guarantees are rare. The paper presents state-of-the-art convergence guarantees for SGD under mild conditions and allows for a large constant step size at the beginning of training. Furthermore, the error bounds of SGD under the bandwidth step size are investigated for both same order and different order boundary functions. The paper also proposes a $1/t$ up-down policy and designs novel non-monotonic step sizes. Numerical experiments demonstrate the efficiency and significant potential of these bandwidth-based step sizes in training regularized logistic regression and several large-scale neural network tasks.
[abs]