Fine-grained categorization can benefit from part-based features which reveal subtle visual differences between object categories. Handcrafted features have been widely used for part detection and classification. Although a recent trend seeks to learn such features automatically using powerful deep learning models such as convolutional neural networks (CNN), their training and possibly also testing require manually provided annotations which are costly to obtain. To relax these requirements, we assume in this study a general problem setting in which the raw images are only provided with object-level class labels for model training with no other side information needed. Specifically, by extracting and interpreting the hierarchical hidden layer features learned by a CNN, we propose an elaborate CNN-based system for fine-grained categorization. When evaluated on the Caltech-UCSD Birds-200-2011, FGVC-Aircraft, Cars and Stanford dogs datasets under the setting that only object-level class labels are used for training and no other annotations are available for both training and testing, our method achieves impressive performance that is superior or comparable to the state of the art. Moreover, it sheds some light on ingenious use of the hierarchical features learned by CNN which has wide applicability well beyond the current fine-grained categorization task.