Musical scales are used in cultures throughout the world, but the question as to how they evolved remains open. Some suggest that scales based on the harmonic series are inherently pleasant, while others propose that scales are chosen that are easy to sing, hear and reproduce accurately. However, testing these theories has been hindered by the sparseness of empirical evidence. Here, to enable such examination, we assimilate data from diverse ethnomusicological sources into a cross-cultural database of scales. We generate populations of scales based on proposed and alternative theories and assess their similarity to empirical distributions from the database. Most scales can be explained as tending to include intervals roughly corresponding to perfect fifths ("imperfect fifths"), and packing arguments explain the salient features of the distributions. Scales are also preferred if their intervals are compressible, which could facilitate efficient communication and memory of melodies. While no single theory can explain all scales, which appear to evolve according to different selection pressures, the simplest harmonicity-based, imperfect-fifths packing model best fits the empirical data.