Are All Layers Created Equal?

Chiyuan Zhang, Samy Bengio, Yoram Singer

Understanding deep neural networks has been a major research objective in recent years with notable theoretical progress. A focal point of those studies stems from the success of excessively large networks which defy the classical wisdom of uniform convergence and learnability. We study empirically the layer-wise functional structure of overparameterized deep models. We provide evidence for the heterogeneous characteristic of layers. To do so, we introduce the notion of robustness to post-training re-initialization and re-randomization. We show that the layers can be categorized as either ``ambient'' or ``critical''. Resetting the ambient layers to their initial values has no negative consequence, and in many cases they barely change throughout training. On the contrary, resetting the critical layers completely destroys the predictor and the performance drops to chanceh. Our study provides further evidence that mere parameter counting or norm accounting is too coarse in studying generalization of deep models, and flatness or robustness analysis of the models needs to respect the network architectures.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment