Reflecting on the Graph Attention Mechanism

Graph Attention Retrospective

Authors: Kimon Fountoulakis, Amit Levi, Shenghao Yang, Aseem Baranwal, Aukosh Jagannath; Volume 24, Issue 246, Pages 1-52, 2023.

Abstract

Graph-based learning is a rapidly growing sub-field of machine learning that finds applications in various domains such as social networks, citation networks, and bioinformatics. One of the most widely used models in this field is the graph attention network. These networks were introduced to allow nodes to aggregate information from their neighboring nodes in a non-uniform way, unlike simple graph convolution methods that treat all neighbors equally. In this paper, we present a theoretical study on the behavior of graph attention networks. We provide several results on the performance of the graph attention mechanism for node classification in a contextual stochastic block model. In this model, node features are derived from a mixture of Gaussians, and the edges are generated using a stochastic block model. We demonstrate that in an “easy” regime, where the means of the Gaussians are sufficiently far apart, the graph attention mechanism is able to effectively distinguish inter-class edges from intra-class edges. This allows it to assign higher weights to important edges and significantly reduce the weights of unimportant edges, resulting in perfect node classification. However, in the “hard” regime, we prove that no attention mechanism can accurately distinguish intra-class edges from inter-class edges. Additionally, we show that even if intra-class edges could be separated from inter-class edges, graph attention convolution is unable to achieve perfect node classification. In addition to perfect node classification, we also present a positive result on the robustness of graph attention against structural noise in the graph. Our robustness result implies that graph attention can outperform both simple graph convolution and the best linear classifier of node features. We validate our theoretical findings using synthetic and real-world datasets.

[Abstract]

[pdf][bib]

[code]