Exploration of Efficient Model Updates: Dynamic Sparse Training for Continual Learning

Continual learning (CL) is the ability of an intelligent system to acquire and retain knowledge from a stream of data with minimal computational overhead. Various approaches, such as regularization, replay, architecture, and parameter isolation, have been introduced to achieve this goal. Parameter isolation involves using a sparse network to allocate different parts of the neural network to different tasks and sharing parameters between similar tasks. Dynamic Sparse Training (DST) is a method commonly used to find and isolate these sparse networks for each task. This study aims to fill a research gap by empirically investigating the effect of different DST components under the CL paradigm and identifying the optimal configuration for CL. The study conducts a comprehensive evaluation of various DST components on CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup. The focus is on evaluating the performance of different DST criteria rather than the process of mask selection. The study reveals that at low sparsity levels, Erdos-Renyi Kernel (ERK) initialization efficiently utilizes the backbone and facilitates effective learning of task increments. However, at high sparsity levels, uniform initialization demonstrates more reliable and robust performance. The growth strategy’s performance depends on the initialization strategy and the extent of sparsity. Lastly, incorporating adaptivity within DST components shows promise for enhancing continual learners.