Nonnegative matrix factorization is a powerful technique to realize dimension reduction and pattern recognition through single-layer data representation learning. Deep learning, however, with its carefully designed hierarchical structure, is able to combine hidden features to form more representative features for pattern recognition. In this paper, we proposed sparse deep nonnegative matrix factorization models to analyze complex data for more accurate classification and better feature interpretation. Such models are designed to learn localized features or generate more discriminative representations for samples in distinct classes by imposing $L_1$-norm penalty on the columns of certain factors. By extending one-layer model into multi-layer one with sparsity, we provided a hierarchical way to analyze big data and extract hidden features intuitively due to nonnegativity. We adopted the Nesterov's accelerated gradient algorithm to accelerate the computing process with the convergence rate of $O(1/k^2)$ after $k$ steps iteration. We also analyzed the computing complexity of our framework to demonstrate their efficiency. To improve the performance of dealing with linearly inseparable data, we also considered to incorporate popular nonlinear functions into this framework and explored their performance. We applied our models onto two benchmarking image datasets, demonstrating our models can achieve competitive or better classification performance and produce intuitive interpretations compared with the typical NMF and competing multi-layer models.