Datasets as Interacting Particle Systems: a Framework for Clustering

Giuliano Armano, Marco Alberto Javarone

In this paper we propose a framework inspired by interacting particle physics and devised to perform clustering on multidimensional datasets. To this end, any given dataset is modeled as an interacting particle system, under the assumption that each element of the dataset corresponds to a different particle and that particle interactions are rendered through gaussian potentials. Moreover, the way particle interactions are evaluated depends on a parameter that controls the shape of the underlying gaussian model. In principle, different clusters of proximal particles can be identified, according to the value adopted for the parameter. This degree of freedom in gaussian potentials has been introduced with the goal of allowing multiresolution analysis. In particular, upon the adoption of a standard community detection algorithm, multiresolution analysis is put into practice by repeatedly running the algorithm on a set of adjacency matrices, each dependent on a specific value of the parameter that controls the shape of gaussian potentials. As a result, different partitioning schemas are obtained on the given dataset, so that the information thereof can be better highlighted, with the goal of identifying the most appropriate number of clusters. Solutions achieved in synthetic datasets allowed to identify a repetitive pattern, which appear to be useful in the task of identifying optimal solutions while analysing other synthetic and real datasets.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment