Supervised learning with deep convolutional neural networks (DCNNs) has seen huge adoption in stereo matching. However, the acquisition of large-scale datasets with well-labeled ground truth is cumbersome and labor-intensive, making supervised learning-based approaches often hard to implement in practice. To overcome this drawback, we propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo. Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution; while our PVM can generate reliable semi-dense disparity images, which can be employed to supervise OptStereo training. Furthermore, we publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes. Extensive experimental results demonstrate the effectiveness and efficiency of our self-supervised stereo matching approach on the KITTI Stereo benchmarks and our HKUST-Drive dataset. PVStereo, our best-performing implementation, greatly outperforms all other state-of-the-art self-supervised stereo matching approaches. Our project page is available at sites.google.com/view/pvstereo.