In this paper we propose Fed-ensemble: a simple approach that bringsmodel ensembling to federated learning (FL). Instead of aggregating localmodels to update a single global model, Fed-ensemble uses random permutations to update a group of K models and then obtains predictions through model averaging. Fed-ensemble can be readily utilized within established FL methods and does not impose a computational overhead as it only requires one of the K models to be sent to a client in each communication round. Theoretically, we show that predictions on newdata from all K models belong to the same predictive posterior distribution under a neural tangent kernel regime. This result in turn sheds light onthe generalization advantages of model averaging. We also illustrate thatFed-ensemble has an elegant Bayesian interpretation. Empirical results show that our model has superior performance over several FL algorithms,on a wide range of data sets, and excels in heterogeneous settings often encountered in FL applications.