In large-scale unsupervised datasets, Semi-supervised Learning (SSL) has a vulnerability to out-of-distribution (OOD) samples. This is because it mistakenly labels OOD samples as in-distribution (ID) due to over-confidence in pseudo-labeling. The problem stems from the spreading of class-wise latent space from closed seen space to open unseen space, and this bias is amplified in SSL’s self-training loops. To address this issue and improve the rejection of OODs for safe SSL, we propose Prototype Fission (PF).

PF aims to divide class-wise latent spaces into compact sub-spaces through automatic fine-grained latent space mining. This is achieved by creating multiple unique learnable sub-class prototypes for each class, optimized for both diversity and consistency. The Diversity Modeling term encourages samples to be clustered by one of the sub-class prototypes, while the Consistency Modeling term clusters all samples of the same class to a global prototype.

Unlike the conventional approach of modeling the OOD distribution, PF focuses on “closing set” by making it difficult for OOD samples to fit into the sub-class latent space. This makes PF compatible with existing methods and allows for further performance improvements. Extensive experiments have been conducted to validate the effectiveness of PF in open-set SSL settings. Our method successfully forms sub-classes, discriminates OODs from IDs, and improves overall accuracy. We will be releasing the codes for PF.