Coresets for $k$-Means and $k$-Median Clustering and their Applications

Sariel Har-Peled, Soham Mazumdar

$\renewcommand{\Re}{{\rm I\!\hspace{-0.025em} R}} \newcommand{\eps}{{\varepsilon}} \newcommand{\Coreset}{{\mathcal{S}}} $ In this paper, we show the existence of small coresets for the problems of computing $k$-median and $k$-means clustering for points in low dimension. In other words, we show that given a point set $P$ in $\Re^d$, one can compute a weighted set $\Coreset \subseteq P$, of size $O(k \eps^{-d} \log{n})$, such that one can compute the $k$-median/means clustering on $\Coreset$ instead of on $P$, and get an $(1+\eps)$-approximation. As a result, we improve the fastest known algorithms for $(1+\eps)$-approximate $k$-means and $k$-median clustering. Our algorithms have linear running time for a fixed $k$ and $\eps$. In addition, we can maintain the $(1+\eps)$-approximate $k$-median or $k$-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

Knowledge Graph



Sign up or login to leave a comment