Images captured in the low-light condition suffer from low visibility and various imaging artifacts, e.g., real noise. Existing supervised enlightening algorithms require a large set of pixel-aligned training image pairs, which are hard to prepare in practice. Though weakly-supervised or unsupervised methods can alleviate such challenges without using paired training images, some real-world artifacts inevitably get falsely amplified because of the lack of corresponded supervision. In this paper, instead of using perfectly aligned images for training, we creatively employ the misaligned real-world images as the guidance, which are considerably easier to collect. Specifically, we propose a Cross-Image Disentanglement Network (CIDN) to separately extract cross-image brightness and image-specific content features from low/normal-light images. Based on that, CIDN can simultaneously correct the brightness and suppress image artifacts in the feature domain, which largely increases the robustness to the pixel shifts. Furthermore, we collect a new low-light image enhancement dataset consisting of misaligned training images with real-world corruptions. Experimental results show that our model achieves state-of-the-art performances on both the newly proposed dataset and other popular low-light datasets.