Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Artem Vysogorets, Julia Kempe; 24(99):1−23, 2023.

Abstract

Neural network pruning is an area of research that has gained significant interest, particularly in high sparsity regimes. In this field, accurate representation of subnetwork sparsity is crucial for benchmarking, traditionally measured as the fraction of removed connections (direct sparsity). However, this definition fails to account for unpruned parameters that are disconnected from the input or output layers of the subnetworks, potentially leading to an underestimation of actual effective sparsity: the fraction of inactivated connections. While this effect may be negligible for moderately pruned networks (compression rates up to 10–100), we find that it becomes increasingly significant for sparser subnetworks, distorting the comparison between different pruning algorithms. For instance, we demonstrate that the effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart, whereas no discrepancy is observed when using SynFlow for pruning (Tanaka et al., 2020). In this study, we adopt the perspective of effective sparsity to reevaluate several recent pruning algorithms on common benchmark architectures (e.g., LeNet-300-100, VGG-19, ResNet-18). We discover that their absolute and relative performance changes dramatically in this new framework, which we argue is more appropriate. To prioritize effective sparsity over direct sparsity, we propose a cost-effective extension to most pruning algorithms. Furthermore, by using effective sparsity as a reference, we partially confirm that random pruning with appropriate sparsity allocation across layers performs as well or better than more sophisticated algorithms for pruning at initialization (Su et al., 2020). In response to this finding, we design novel layerwise sparsity quotas, inspired by the analogy of pressure distribution in coupled cylinders from thermodynamics, which outperform all existing baselines in the context of random pruning.

[abs]

[pdf][bib]