Image-based yield detection in agriculture could raiseharvest efficiency and cultivation performance of farms. Following this goal, this research focuses on improving instance segmentation of field crops under varying environmental conditions. Five data sets of cabbage plants were recorded under varying lighting outdoor conditions. The images were acquired using a commercial mono camera. Additionally, depth information was generated out of the image stream with Structure-from-Motion (SfM). A Mask R-CNN was used to detect and segment the cabbage heads. The influence of depth information and different colour space representations were analysed. The results showed that depth combined with colour information leads to a segmentation accuracy increase of 7.1%. By describing colour information by colour spaces using light and saturation information combined with depth information, additional segmentation improvements of 16.5% could be reached. The CIELAB colour space combined with a depth information layer showed the best results achieving a mean average precision of 75.