Statistical Comparisons of Classifiers by Generalized Stochastic Dominance

Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin; 24(231):1−37, 2023.

Abstract

Comparing classifiers across multiple data sets and criteria is a crucial question in the development of machine learning algorithms, but there is no consensus on how to do it. Existing comparison frameworks face three fundamental challenges: the multitude of quality criteria, the multitude of data sets, and the randomness of data set selection. In this paper, we propose a fresh approach by leveraging recent developments in decision theory. Our framework ranks classifiers using a generalized concept of stochastic dominance, which avoids the reliance on aggregates that can be cumbersome and self-contradictory. We demonstrate that generalized stochastic dominance can be implemented using linear programs and statistically tested using an adapted two-sample observation-randomization test. This provides a powerful framework for statistically comparing classifiers across multiple data sets and criteria. We validate our framework through a simulation study and by applying it to a set of standard benchmark data sets.

[abs]

[pdf][bib]