A Comprehensive Definition of Redundancy and Relevance in Feature Selection Based on Information Decomposition

Patricia Wollstadt, Sebastian Schmitt, Michael Wibral; 24(131):1−44, 2023.

Abstract

In machine learning and statistics, the selection of a minimal set of features that provides maximum information about a target variable is a fundamental task. Information theory offers a powerful framework for designing feature selection algorithms. However, a rigorous definition of feature relevance that incorporates interactions such as redundancy and synergy is still lacking. This gap exists because classical information theory does not provide measures for decomposing the information provided by a set of variables into unique, redundant, and synergistic contributions. Recently, the partial information decomposition (PID) framework introduced such a decomposition. Using PID, we address the conceptual challenges of feature selection in information theory and propose a novel definition of feature relevance and redundancy. Based on this definition, we demonstrate that the conditional mutual information (CMI) maximizes relevance while minimizing redundancy. We also present an iterative, CMI-based algorithm for practical feature selection. To illustrate the effectiveness of our algorithm, we compare it to the unconditional mutual information on benchmark examples and provide PID estimates that quantify the information contribution of features and their interactions in feature-selection problems.

[abs]

[pdf][bib]