Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint
Michael R. Metel; 24(103):1−44, 2023.
Abstract
This research paper focuses on the application of structured sparsity in deep neural network training. The study explores the utilization of a weighted group $l_0$-norm constraint and provides insights into its projection and normal cone. By employing randomized smoothing, the paper presents zeroth and first-order algorithms for minimizing a Lipschitz continuous function while adhering to a closed set that can be projected onto. Non-asymptotic convergence guarantees are established for the proposed algorithms, considering two convergence criteria that approximate stationary points. Additionally, the paper introduces two methods utilizing the proposed algorithms, one with non-asymptotic convergence guarantees with high probability and the other with asymptotic guarantees towards an almost surely stationary point. Notably, this research provides the first non-asymptotic convergence results for constrained Lipschitz continuous loss functions.
[abs]