We consider distributed machine learning at the wireless edge, where a parameter server builds a global model with the help of multiple wireless edge devices that perform computations on local dataset partitions. Edge devices transmit the result of their computations (updates of current global model) to the server using a fixed rate and orthogonal multiple access over an error prone wireless channel. In case of a transmission error, the undelivered packet is retransmitted until successfully decoded at the receiver. Leveraging on the fundamental tradeoff between computation and communication in distributed systems, our aim is to derive how many edge devices are needed to minimize the average completion time while guaranteeing convergence. We provide upper and lower bounds for the average completion and we find a necessary condition for adding edge devices in two asymptotic regimes, namely the large dataset and the high accuracy regime. Conducted experiments on real datasets and numerical results confirm our analysis and substantiate our claim that the number of edge devices should be carefully selected for timely distributed edge learning.