Improved Approximation Algorithms for Earth-Mover Distance in Data Streams

Arman Yousefi, Rafail Ostrovsky

For two multisets $S$ and $T$ of points in $[\Delta]^2$, such that $|S| = |T|= n$, the earth-mover distance (EMD) between $S$ and $T$ is the minimum cost of a perfect bipartite matching with edges between points in $S$ and $T$, i.e., $EMD(S,T) = \min_{\pi:S\rightarrow T}\sum_{a\in S}||a-\pi(a)||_1$, where $\pi$ ranges over all one-to-one mappings. The sketching complexity of approximating earth-mover distance in the two-dimensional grid is mentioned as one of the open problems in the literature. We give two algorithms for computing EMD between two multi-sets when the number of distinct points in one set is a small value $k=\log^{O(1)}(\Delta n)$. Our first algorithm gives a $(1+\epsilon)$-approximation using $O(k\epsilon^{-2}\log^{4}n)$ space and works only in the insertion-only model. The second algorithm gives a $O(\min(k^3,\log\Delta))$-approximation using $O(\log^{3}\Delta\cdot\log\log\Delta\cdot\log n)$-space in the turnstile model.

Knowledge Graph



Sign up or login to leave a comment