Spectrally-truncated kernel ridge regression and its free lunch

Arash A. Amini

Kernel ridge regression (KRR) is a well-known and popular nonparametric regression approach with many desirable properties, including minimax rate-optimality in estimating functions that belong to common reproducing kernel Hilbert spaces (RKHS). The approach, however, is computationally intensive for large data sets, due to the need to operate on a dense $n \times n$ kernel matrix, where $n$ is the sample size. Recently, various approximation schemes for solving KRR have been considered, and some analyzed. Some approaches such as Nystr\"{o}m approximation and sketching have been shown to preserve the rate optimality of KRR. In this paper, we consider the simplest approximation, namely, spectrally truncating the kernel matrix to its largest $r < n$ eigenvalues. We derive an exact expression for the maximum risk of this truncated KRR, over the unit ball of the RKHS. This result can be used to study the exact trade-off between the level of spectral truncation and the regularization parameter. We show that, as long as the RKHS is infinite-dimensional, there is a threshold on $r$, above which, the spectrally-truncated KRR surprisingly outperforms the full KRR in terms of the minimax risk, where the minimum is taken over the regularization parameter. This strengthens the existing results on approximation schemes, by showing that not only one does not lose in terms of the rates, truncation can in fact improve the performance, for all finite samples (above the threshold). Moreover, we show that the implicit regularization achieved by spectral truncation is not a substitute for Hilbert norm regularization. Both are needed to achieve the best performance.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment