Impact of Classification Difficulty on the Spectra of Weight Matrices in Deep Learning and its Application to Early Stopping
Xuran Meng, Jeff Yao; 24(28):1−40, 2023.
Abstract
Recent research has focused on understanding the success of deep learning. Random Matrix Theory (RMT) offers an approach for this by analyzing the spectra of large random matrices, such as weight matrices or Hessian matrices, involved in trained deep neural networks (DNNs) using algorithms like stochastic gradient descent. To gain a better understanding of weight matrix spectra, we conducted extensive experiments on weight matrices under various settings for layers, networks, and datasets. Building on previous work by Martin et al. (2018), we classified weight matrix spectra at the terminal stage of training into three main types: Light Tail (LT), Bulk Transition period (BT), and Heavy Tail (HT). These different types, particularly HT, imply some form of regularization in DNNs. In this paper, inspired by Martin et al. (2018), we identify the difficulty of the classification problem as an important factor in the appearance of HT in weight matrix spectra. The higher the classification difficulty, the higher the chance for HT to appear. Furthermore, classification difficulty can be influenced by the signal-to-noise ratio of the dataset or the complexity of the classification problem (such as complex features or a large number of classes). Leveraging this finding, we propose a spectral criterion to detect the presence of HT and use it to early stop the training process without the need for testing data. These early stopped DNNs have the advantage of avoiding overfitting and unnecessary additional training while still maintaining comparable generalization ability. We validate these findings using several neural networks (LeNet, MiniAlexNet, and VGG) with Gaussian synthetic data and real datasets (MNIST and CIFAR10).
[abs]