Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? We demonstrate the learning scheme on classical and hard image-classification benchmarks, namely MNIST, CIFAR10 and CIFAR100, cast as direct reward tasks, both for fully connected, convolutional and locally connected architectures. We show that our learning rule - Q-AGREL - performs comparably to supervised learning via error-backpropagation, with this type of trial-and-error reinforcement learning requiring only 1.5-2.5 times more epochs, even when classifying 100 different classes as in CIFAR100. Our results provide new insights into how deep learning may be implemented in the brain.