In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian complexities and empirical $\ell_\infty$-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the $\ell_2$- and $\ell_\infty$-norm, respectively. We establish data-dependent error bounds in terms of complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines, exhibiting a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class SVM by Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency which significantly improves the previous square-root dependency. Experimental results are reported to verify the effectiveness of our theoretical findings.