Improved Distortion and Spam Resistance for PageRank

Lucas Farach-Colton, Martin Farach-Colton, Leslie Ann Goldberg, John Lapinskas, Reut Levi, Moti Medina, Miguel Mosteiro

For a directed graph $G = (V,E)$, a ranking function, such as PageRank, provides a way of mapping elements of $V$ to non-negative real numbers so that nodes can be ordered. Brin and Page argued that the stationary distribution, $R(G)$, of a random walk on $G$ is an effective ranking function for queries on an idealized web graph. However, $R(G)$ is not defined for all $G$, and in particular, it is not defined for the real web graph. Thus, they introduced PageRank to approximate $R(G)$ for graphs $G$ with ergodic random walks while being defined on all graphs. PageRank is defined as a random walk on a graph, where with probability $(1-\epsilon)$, a random out-edge is traversed, and with \emph{reset probability} $\epsilon$ the random walk instead restarts at a node selected using a \emph{reset vector} $\hat{r}$. Originally, $\hat{r}$ was taken to be uniform on the nodes, and we call this version UPR. In this paper, we introduce graph-theoretic notions of quality for ranking functions, specifically \emph{distortion} and \emph{spam resistance}. We show that UPR has high distortion and low spam resistance and we show how to select an $\hat{r}$ that yields low distortion and high spam resistance.

Knowledge Graph



Sign up or login to leave a comment