Network pruning or network sparsification has a long history and practical significance in modern applications. The loss surface of dense neural networks would yield a bad landscape due to non-convexity and non-linear activations, but over-parameterization may lead to benign geometrical properties. In this paper, we study sparse networks with the squared loss objective, showing that like dense networks, sparse networks can still preserve benign landscape when the last hidden layer width is larger than the number of training data. Our results have been built on general linear sparse networks, linear CNNs (a special class of sparse networks), and nonlinear sparse networks. We also present counterexamples when certain assumptions are violated, which implies that these assumptions are necessary for our results.