Gradient Regularisation as Approximate Variational Inference

Ali Unlu, Laurence Aitchison

Variational inference in Bayesian neural networks is usually performed using stochastic sampling which gives very high-variance gradients, and hence slow learning. Here, we show that it is possible to obtain a deterministic approximation of the ELBO for a Bayesian neural network by doing a Taylor-series expansion around the mean of the current variational distribution. The resulting approximate ELBO is the training-log-likelihood plus a squared gradient regulariser. In addition to learning the approximate posterior variance, we also consider a uniform-variance approximate posterior, inspired by the stationary distribution of SGD. The corresponding approximate ELBO has a simple form, as the log-likelihood plus a simple squared-gradient regulariser. We argue that this squared-gradient regularisation may at the root of the excellent empirical performance of SGD.

arrow_drop_up