Robust localisation and identification of vertebrae, jointly termed vertebrae labelling, in computed tomography (CT) images is an essential component of automated spine analysis. Current approaches for this task mostly work with 3D scans and are comprised of a sequence of multiple networks. Contrarily, our approach relies only on 2D reformations, enabling us to design an end-to-end trainable, standalone network. Our contribution includes: (1) Inspired by the workflow of human experts, a novel butterfly-shaped network architecture (termed Btrfly net) that efficiently combines information across sufficiently-informative sagittal and coronal reformations. (2) Two adversarial training regimes that encode an anatomical prior of the spine's shape into the Btrfly net, each enforcing the prior in a distinct manner. We evaluate our approach on a public benchmarking dataset of 302 CT scans achieving a performance comparable to state-of-art methods (identification rate of $>$88%) without any post-processing stages. Addressing its translation to clinical settings, an in-house dataset of 65 CT scans with a higher data variability is introduced, where we discuss refinements that render our approach robust to such scenarios.