Multivariate Soft Rank via Entropy-Regularized Optimal Transport: Sample Efficiency and Generative Modeling

Shoaib Bin Masud, Matthew Werenski, James M. Murphy, Shuchin Aeron; 24(160):1−65, 2023.

Abstract

This paper introduces the concept of multivariate soft rank, which addresses some limitations of existing rank-based goodness-of-fit (GoF) statistics in terms of computational cost, sample complexity, and differentiability. The soft rank is defined as an entropic transport map derived from the entropic regularization of the optimal transport problem. Two new statistics, namely the soft rank energy (sRE) and soft rank maximum mean discrepancy (sRMMD), are proposed. The paper provides non-asymptotic convergence rates for the sample estimate of the entropic transport map and shows that the sample estimates of sRE and sRMMD converge rapidly to their population versions. The computational efficiency of methods in solving the entropy-regularized optimal transport problem enables efficient rank-based GoF statistical computation, even in high dimensions. Additionally, the sample estimates of sRE and sRMMD are differentiable with respect to the data, making them suitable for popular machine learning frameworks that rely on gradient methods. The paper demonstrates the utility of these statistics in generative modeling, specifically in image generation and generating valid knockoffs for controlled feature selection.

[abs]

[pdf][bib]

[code]