Large sample spectral analysis of graph-based multi-manifold clustering

Authors: Nicolas Garcia Trillos, Pengfei He, Chenghui Li; Journal of Machine Learning Research (JMLR), 24(143):1−71, 2023.

Abstract

This study focuses on the statistical properties of graph-based algorithms for multi-manifold clustering (MMC). MMC aims to identify the multi-manifold structure underlying a given Euclidean dataset, assuming that the data is obtained by sampling a distribution on a union of manifolds $\\M = \\M_1 \\cup\\dots \\cup \\M_N$ that may intersect and have different dimensions. The paper investigates the sufficient conditions that similarity graphs on datasets must satisfy in order for their corresponding graph Laplacians to accurately capture the geometric information necessary to solve the MMC problem. Specifically, the paper provides high probability error bounds for the spectral approximation of a tensorized Laplacian on $\\M$ using a suitable graph Laplacian constructed from the observations. The recovered tensorized Laplacian contains all the geometric information of the individual underlying manifolds. The paper also presents a family of similarity graphs called annular proximity graphs with angle constraints, which satisfy these sufficient conditions. This family of graphs is compared with other constructions in the literature based on the alignment of tangent planes. The paper concludes with extensive numerical experiments that further validate the insights provided by the theory on the MMC problem.

[abs]

[pdf][bib]
      
[code]