Deep linear networks outperform shallow networks by benignly overfitting

Deep linear networks can overfit benignly when shallow ones do

Authors: Niladri S. Chatterji, Philip M. Long; Published in 2023, Vol. 24(117), Pages 1-39.

Abstract

This study focuses on bounding the excess risk of interpolating deep linear networks trained using gradient flow. In a previous study that established risk bounds for the minimum $\ell_2$-norm interpolant, we demonstrate that randomly initialized deep linear networks can closely approximate or even match the known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear models have the same conditional variance as the minimum $\ell_2$-norm solution. As the noise affects the excess risk solely through the conditional variance, this finding implies that the depth of the network does not enhance the algorithm’s ability to “hide the noise”. Our simulations confirm that some aspects of our bounds reflect typical behavior for simple data distributions. Additionally, we observe similar phenomena in simulations with ReLU networks, although the situation there is more nuanced.

[abs]

[pdf][bib]

[code]