Previous research utilized backward error analysis to identify ordinary differential equations (ODEs) that approximate the gradient descent trajectory. These studies revealed that finite step sizes implicitly regulate solutions by penalizing the two-norm of the loss gradients in the ODEs. In this study, we demonstrate that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage. However, a different “norm” is involved: the ODE terms either penalize the (perturbed) one-norm of the loss gradients or hinder its decrease (typically). To support our findings, we conduct numerical experiments and discuss the potential impact of these proven facts on generalization.
Live Search
Blocksy: Search Block
Posts
Discere veritus detraxit pri ut, sea ei dicunt theophrastus. Eum harum animal debitis cu
Melissa Peterson
Popular Posts
Contact Info
Lorem ipsum dolor sit amet has ignota putent ridens aliquid indoctum anad movet graece vimut omnes.
Blocksy: Contact Info
About Us
Useful Information
Vim in meis verterem menandri, ea iuvaret delectus verterem qui, nec ad ferri corpora.
Euismod nisi porta lorem mollis. Interdum velit euismod in pellentesque.