Theoretical Optimality and Practical Improvements of Decentralized Learning
Authors: Yucheng Lu, Christopher De Sa; Published in 2023, Volume 24, Issue 93, Pages 1-62.
Abstract
Scaling up parallel machine learning systems through decentralization has shown great potential. This paper focuses on providing a rigorous lower bound on the iteration complexity of decentralized training algorithms in a stochastic non-convex setting. By doing so, it exposes a theoretical gap in the convergence rates of existing methods like D-PSGD. The authors construct a lower bound that is not only achievable but also optimal. Based on their findings, they introduce DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithmic gap. They demonstrate that a simplified version of DeTAG, which uses plain SGD with a constant step size, can achieve the theoretical limits. Additionally, the authors provide a convergence bound for DeTAG under general non-increasing step size and momentum. Experimental evaluations on various vision benchmarks, including CIFAR10/100 and ImageNet, validate the authors’ theoretical claims, showing that DeTAG converges faster on unshuffled data and in sparse networks. Furthermore, they introduce a variant of DeTAG, called DeTAG*, which accelerates data-center-scale model training. This manuscript provides additional content compared to its ICML version.
[abs]