Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

Michael R. Metel

Motivated by neural network training in low-bit floating and fixed-point environments, this work studies the convergence of variants of SGD with computational error. Considering a general stochastic Lipschitz continuous loss function, a novel convergence result to a Clarke stationary point is presented assuming that only an approximation of its stochastic gradient can be computed as well as error in computing the SGD step itself. Different variants of SGD are then tested empirically in a variety of low-precision arithmetic environments, with improved test set accuracy achieved compared to SGD for two image recognition tasks.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment