In various real-world applications, imbalanced datasets pose significant challenges when training classifiers. This issue becomes even more difficult when working with large datasets. To address this problem, over-sampling techniques have been developed to interpolate data instances between minority and majority classes. However, in scenarios like anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together.

Taking inspiration from domain-agnostic data mix-up, we propose an iterative approach to generate synthetic samples by mixing data samples from both minority and majority classes. Developing such a framework is not trivial, as it involves challenges like source sample selection, mix-up strategy selection, and coordination between the underlying model and mix-up strategies.

To overcome these challenges, we formulate the problem of iterative data mix-up as a Markov decision process (MDP) that maps data attributes onto an augmentation strategy. We use an actor-critic framework to solve the MDP, adapting the discrete-continuous decision space. This framework trains a data augmentation policy and designs a reward signal that explores classifier uncertainty and promotes performance improvement, regardless of the classifier’s convergence.

To demonstrate the effectiveness of our proposed framework, we conduct extensive experiments on seven publicly available benchmark datasets using three different types of classifiers. The results of these experiments show the potential and promise of our framework in addressing imbalanced datasets with diverse minorities.