The concept of Class Incremental Learning (CIL) revolves around the idea of learning new classes in a sequential manner while preventing the loss of previous knowledge. In order to achieve this, we suggest the use of Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were initially designed for unsupervised learning, specifically for generating useful representations through reconstruction. However, they can easily be combined with supervised loss for classification purposes.

One advantage of MAEs is their ability to accurately reconstruct original input images from randomly selected patches. This feature allows us to store exemplars from past tasks more efficiently, aiding in the CIL process. To further enhance the performance of MAEs, we introduce a bilateral MAE framework that combines image-level and embedding-level fusion. This fusion technique leads to higher-quality reconstructed images and more stable representations.

Through our extensive experiments, we have validated the effectiveness of our approach. It outperformed the current state-of-the-art methods on benchmark datasets such as CIFAR-100, ImageNet-Subset, and ImageNet-Full. For those interested, the code for our approach is available at the following GitHub repository: https://github.com/scok30/MAE-CIL.