A Geometric Approach to Sparse PCA

Dimitris Bertsimas, Driss Lahlou Kitane; 24(32):1−33, 2023.

Abstract

This paper presents a novel algorithm called GeoSPCA for solving the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components with a fixed cardinality. Unlike existing methods that build principal components iteratively through deflation, GeoSPCA builds all principal components at once while ensuring orthogonality. The algorithm is based on the left eigenvalues of the covariance matrix, which approximates the optimal solution by formulating it as a binary linear optimization problem. This approximation can handle various versions of the sparse PCA problem, including cases where the principal components have the same support or disjoint supports, as well as the Structured Sparse PCA problem. The paper also provides optimality bounds and demonstrates the benefits of GeoSPCA in real-world problems, showing improvements of up to 24% in terms of variance compared to the greedy algorithm. Additionally, GeoSPCA is capable of solving problems with thousands of variables and support cardinality of hundreds within minutes. The algorithm is also applied to a face recognition problem, outperforming other PCA-based techniques such as structured sparse PCA by more than 10%.

[abs]

[pdf][bib]