Federated learning enables a cluster of decentralized mobile devices at the edge to collaboratively train a shared machine learning model, while keeping all the raw training samples on device. This decentralized training approach is demonstrated as a practical solution to mitigate the risk of privacy leakage. However, enabling efficient FL deployment at the edge is challenging because of non-IID training data distribution, wide system heterogeneity and stochastic-varying runtime effects in the field. This paper jointly optimizes time-to-convergence and energy efficiency of state-of-the-art FL use cases by taking into account the stochastic nature of edge execution. We propose AutoFL by tailor-designing a reinforcement learning algorithm that learns and determines which K participant devices and per-device execution targets for each FL model aggregation round in the presence of stochastic runtime variance, system and data heterogeneity. By considering the unique characteristics of FL edge deployment judiciously, AutoFL achieves 3.6 times faster model convergence time and 4.7 and 5.2 times higher energy efficiency for local clients and globally over the cluster of K participants, respectively.