Universal Metric Embeddings with Small Transformers

Anastasis Kratsios, Valentin Debarnot, Ivan Dokmanić; 24(170):1−48, 2023.

Abstract

This study focuses on representing data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures equipped with a transport metric (Delon and Desolneux 2020). We provide embedding guarantees for feature maps implemented by small neural networks known as probabilistic transformers. Our guarantees are categorized as memorization type, demonstrating that a probabilistic transformer with a depth of about $n\log(n)$ and a width of about $n^2$ can bi-H\”older embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We also derive probabilistic bi-Lipschitz guarantees, which balance the amount of distortion with the probability that a randomly chosen pair of points embeds with that distortion. If the geometry of $\mathcal{X}$ is sufficiently regular, we obtain stronger bi-Lipschitz guarantees for all points. As applications, we provide neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When embedding into multivariate Gaussian mixtures, we show that probabilistic transformers compute bi-Hölder embeddings with arbitrarily small distortion. Our results indicate that any finite metric dataset, from vertices on a graph to functions in a function space, can be accurately represented in a single representation space using a simple transformer architecture. Therefore, a modular set of machine learning tools compatible with this one representation space, many of which already exist, may be sufficient for downstream supervised and unsupervised learning from a wide range of data types.

[abs]

[pdf][bib]
      
[code]