Asymptotics of Network Embeddings Learned via Subsampling

Authors: Andrew Davison, Morgane Austern; Publication Date: 2023; Journal: 24(138):1−120

Abstract

Network data plays a crucial role in modern machine learning, with various tasks such as node classification, node clustering, and link prediction. One common approach is to first learn an Euclidean embedding of the network and then apply algorithms designed for vector-valued data. Stochastic gradient methods are often used to learn embeddings for large networks, where the sub-sampling scheme can be chosen freely. Despite the strong empirical performance of these methods, their theoretical understanding is limited. In this work, we propose a unified framework that encompasses representation methods using a subsampling approach, such as node2vec. We prove that, under the assumption that the graph is exchangeable, the distribution of the learned embedding vectors asymptotically decouples. Additionally, we characterize the asymptotic distribution and provide rates of convergence in terms of the latent parameters, including the choice of loss function and the embedding dimension. This theoretical foundation allows us to understand the meaning of the embedding vectors and evaluate the performance of these methods on downstream tasks. Notably, we find that commonly used loss functions may have shortcomings, such as a lack of Fisher consistency.

[abs]

[pdf][bib]
      
[code]