Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning

Zhuang Yang; 24(241):1−29, 2023.

Abstract

Stochastic optimization, particularly stochastic gradient descent (SGD), has become the most commonly used method for solving machine learning problems. In order to enhance the performance of the traditional SGD algorithm, which has a slow convergence rate and poor generalization, several strategies have been developed, such as control variates, adaptive learning rate, and momentum technique. Most of these strategies focus on controlling the updating direction (e.g., gradient descent or gradient ascent) or manipulating the learning rate. In this study, we propose a novel type of improved powered stochastic gradient descent algorithms that use the Powerball function to determine the updating direction. We also address the issue of the learning rate in powered stochastic optimization (PSO) by introducing an adaptive mechanism based on the Barzilai-Borwein (BB) like scheme, not only for the proposed algorithm but also for classical PSO algorithms. The theoretical properties of these algorithms for non-convex optimization problems are analyzed. Empirical tests using various benchmark datasets demonstrate the efficiency and robustness of our proposed algorithms.

[abs]

[pdf][bib]