Strength of Minibatch Noise in SGD

Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning. In this work, we study the nature of SGD noise and fluctuation. We show that some degree of mismatch between model and data complexity is needed for SGD to ``stir" a noise; such mismatch may be due to a label or input noise, regularization, or underparametrization. Compared with previous works, the present work focuses on deriving exactly solvable analytical results. Our work also motivates a more accurate general formulation to describe minibatch noise, and we show that the SGD noise takes different shapes and strengths in different kinds of minima.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment