MMD Aggregated Two-Sample Test

Authors: Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton; 24(194):1−81, 2023.

Abstract

The paper introduces two new nonparametric two-sample kernel tests that are based on the Maximum Mean Discrepancy (MMD). The first test uses permutations or a wild bootstrap to determine the test threshold for a fixed kernel. It is proven that this test controls the probability of type I error non-asymptotically, making it reliable for settings with small sample sizes. This is different from previous MMD tests that only guarantee correct test level asymptotically. The authors also prove the minimax optimality of their MMD test with a specific kernel when the difference in densities lies in a Sobolev ball. However, since the smoothness parameter of the Sobolev ball is usually unknown, an aggregated test called MMDAgg is constructed to overcome this issue. MMDAgg is adaptive to the smoothness parameter and maximizes the test power over the collection of kernels used, without requiring held-out data for kernel selection or arbitrary kernel choices. The authors prove that MMDAgg still controls the level non-asymptotically and achieves the minimax rate over Sobolev balls. The guarantees provided are not limited to a specific type of kernel but hold for any product of one-dimensional translation invariant characteristic kernels. A user-friendly parameter-free implementation of MMDAgg is provided using an adaptive collection of bandwidths. The paper demonstrates that MMDAgg outperforms alternative MMD-based two-sample tests on synthetic data satisfying the Sobolev smoothness assumption and matches the power of tests using models like neural networks on real-world image data.

[abs]

[pdf][bib]
[code]