A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning

Jinshan Zeng, Shao-Bo Lin, Yuan Yao

Efficient training of deep neural networks (DNNs) is a challenge due to the associated highly nonconvex optimization. The alternating direction method of multipliers (ADMM) has attracted rising attention in deep learning for its potential of distributed computing. However, it remains an open problem to establish the convergence of ADMM in DNN training due to the nonlinear constraints involved. In this paper, we provide an answer to this problem by establishing the convergence of some nonlinearly constrained ADMM for DNNs with smooth activations. To be specific, we establish the global convergence to a Karush-Kuhn-Tucker (KKT) point at a ${\cal O}(1/k)$ rate. To achieve this goal, the key development lies in a new local linear approximation technique which enables us to overcome the hurdle of nonlinear constraints in ADMM for DNNs.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment