Risk Bounds for Positive-Unlabeled Learning Under the Selected At Random Assumption

Olivier Coudray, Christine Keribin, Pascal Massart, Patrick Pamphile; 24(107):1−31, 2023.

Abstract

Positive-Unlabeled learning (PU learning) is a variant of semi-supervised binary classification where only a subset of positive examples is labeled. The main challenge is to accurately classify the unlabeled examples despite the lack of information. Recent methodologies have been developed to tackle the scenario where the labeling probability depends on the covariates. This paper focuses on establishing risk bounds for PU learning under this general assumption. Additionally, the impact of label noise on PU learning is quantified and compared to the standard classification setting. Lastly, a lower bound on the minimax risk is provided, demonstrating that the upper bound is nearly optimal.

[abs]

[pdf][bib]