Lifted Bregman Training of Neural Networks
Xiaoyu Wang, Martin Benning; 24(232):1−51, 2023.
Abstract
A new mathematical formulation is introduced for training feed-forward neural networks with potentially non-smooth proximal maps as activation functions. This formulation is based on Bregman distances, and a major advantage is that its partial derivatives with respect to the network’s parameters do not require the computation of derivatives of the network’s activation functions. Instead of estimating the parameters using a combination of first-order optimization methods and back-propagation (as is the current state-of-the-art), the use of non-smooth first-order optimization methods that leverage the specific structure of the novel formulation is proposed. Several numerical results are presented to demonstrate that these training approaches can be equally or even better suited for training neural network-based classifiers and (denoising) autoencoders with sparse coding compared to more conventional training frameworks.
[abs]