Diffractive deep neural networks have been introduced earlier as an optical machine learning framework that uses task-specific diffractive surfaces designed by deep learning to all-optically perform inference, achieving promising performance for object classification and imaging. Here we demonstrate systematic improvements in diffractive optical neural networks based on a differential measurement technique that mitigates the non-negativity constraint of light intensity. In this scheme, each class is assigned to a separate pair of photodetectors, behind a diffractive network, and the class inference is made by maximizing the normalized signal difference between the detector pairs. Moreover, by utilizing the inherent parallelization capability of optical systems, we reduced the signal coupling between the positive and negative detectors of each class by dividing their optical path into two jointly-trained diffractive neural networks that work in parallel. We further made use of this parallelization approach, and divided individual classes among multiple jointly-trained differential diffractive neural networks. Using this class-specific differential detection in jointly-optimized diffractive networks, our simulations achieved testing accuracies of 98.52%, 91.48% and 50.82% for MNIST, Fashion-MNIST and grayscale CIFAR-10 datasets, respectively. Similar to ensemble methods practiced in machine learning, we also independently-optimized multiple differential diffractive networks that optically project their light onto a common detector plane, and achieved testing accuracies of 98.59%, 91.06% and 51.44% for MNIST, Fashion-MNIST and grayscale CIFAR-10, respectively. Through these systematic advances in designing diffractive neural networks, the reported classification accuracies set the state-of-the-art for an all-optical neural network design.