Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

Benjamin Moseley, Joshua R. Wang; 24(1):1−36, 2023.

Abstract

This paper aims to provide an analytical framework for understanding hierarchical clustering, a widely used data analysis method. By establishing a well-understood foundation, this framework can support existing methods and guide future improvements. The paper focuses on the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main finding is that the popular algorithm average linkage agglomerative clustering has a small constant approximation ratio for this objective. On the other hand, several other popular algorithms, including bisecting k-means divisive clustering, have a very poor lower bound on their approximation ratio for the same objective. However, the paper presents two constant approximation algorithms for divisive clustering that perform well with respect to this objective. This work is among the first to provide guarantees on widely used hierarchical algorithms for a natural objective function. The objective and analysis shed light on what these popular algorithms optimize and when they are likely to perform well.

[abs]

[pdf][bib]