Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer, V. Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu; 24(65):1−37, 2023.
Abstract
This paper presents detailed experiments that support the argument that wide minima generalize better than narrow minima. Additionally, the authors propose a new hypothesis that the density of wide minima is likely lower than that of narrow minima, and provide empirical evidence to support it. Based on this hypothesis, the authors design a novel explore-exploit learning rate schedule. The results on various image and natural language datasets demonstrate that compared to the original hand-tuned learning rate baselines, the explore-exploit schedule can result in either up to 0.84% higher absolute accuracy within the original training budget or up to 57% reduced training time while maintaining the original reported accuracy.
[abs]