Selection by Prediction with Conformal p-values
Authors: Ying Jin, Emmanuel J. Candes; Published in volume 24(244) on pages 1-41 in 2023.
Abstract
Decision making and scientific discovery pipelines often involve multiple stages, where initial screening is used to shortlist candidates from a large pool based on predictions from a machine learning model. In this study, we propose a screening procedure that selects candidates whose unobserved outcomes exceed user-specified values. Our method, based on the conformal inference framework, constructs p-values to quantify statistical evidence for large outcomes and compares them to a threshold from multiple testing literature to determine the shortlist. The procedure often selects candidates with predictions above a data-dependent threshold. We provide theoretical guarantees under mild exchangeability conditions on the samples, extending existing results on multiple conformal p-values. We demonstrate the empirical performance of our method through simulations and its application to job hiring and drug discovery datasets.
[abs]
[code]