Robust Subsampling Techniques for Outlier Handling in Persistent Homology

Outlier-Robust Subsampling Techniques for Persistent Homology

Bernadette J. Stolz; 24(90):1−35, 2023.

Abstract

Persistent homology has gained popularity in various real-world data applications in recent years. However, the scalability of persistent homology algorithms remains a challenge, especially for large datasets. To address this issue, one approach is to select a subset of landmarks through subsampling from the data. Currently, random selection and the maxmin algorithm are commonly used for landmark selection. However, random selection tends to favor dense areas of the data, while the maxmin algorithm is sensitive to noise. In this study, we propose a novel approach to selecting landmarks specifically for persistent homology, which preserves coarse topological information of the original dataset. Our method is motivated by the Mayer-Vietoris sequence and only requires local persistent homology calculations, enabling efficient computation. We evaluate our landmark selection method on artificial datasets with varying levels of noise and compare it to standard techniques. The results demonstrate that our approach outperforms standard methods and a subsampling technique based on an outlier-robust version of the k-means algorithm, particularly for low sampling densities in noisy data, in terms of robustness to outliers.

[abs]

[pdf][bib]

[code]