Clustering in Hilbert simplex geometry

Frank Nielsen, Ke Sun

Clustering categorical distributions in the probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, the differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the dualistic information-geometric structure induced by a smooth dissimilarity measure, the Kullback-Leibler divergence. In this work, we introduce for this clustering task a novel computationally-friendly framework for modeling the probability simplex termed {\em Hilbert simplex geometry}. In the Hilbert simplex geometry, the distance function is described by a polytope. We discuss the pros and cons of those different statistical modelings, and benchmark experimentally these geometries for center-based $k$-means and $k$-center clusterings. We show that Hilbert metric in the probability simplex satisfies the property of information monotonicity. Furthermore, since a canonical Hilbert metric distance can be defined on any bounded convex subset of the Euclidean space, we also consider Hilbert's projective geometry of the elliptope of correlation matrices and study its clustering performances.

Knowledge Graph



Sign up or login to leave a comment