This paper considers the \textit{minimum spanning tree (MST)} problem in the Congested Clique model and presents an algorithm that runs in $O(\log \log \log n)$ rounds, with high probability. Prior to this, the fastest MST algorithm in this model was a deterministic algorithm due to Lotker et al.~(SIAM J on Comp, 2005) from about a decade ago. A key step along the way to designing this MST algorithm is a \textit{connectivity verification} algorithm that not only runs in $O(\log \log \log n)$ rounds with high probability, but also has low message complexity. This allows the fast computation of an MST by running multiple instances of the connectivity verification algorithm in parallel. These results depend on a new edge-sampling theorem, developed in the paper, that says that if each edge $e = \{u, v\}$ is sampled independently with probability $c \log^2 n/\min\{\mbox{degree}(u), \mbox{degree}(v)\}$ (for a large enough constant $c$) then all cuts of size at least $n$ are approximated in the sampled graph. This sampling theorem is inspired by series of papers on graph sparsification via random edge sampling due to Karger~(STOC 1994), Bencz\'ur and Karger~(STOC 1996, arxiv 2002), and Fung et al.~(STOC 2011). The edge sampling techniques in these papers use probabilities that are functions of edge-connectivity or a related measure called edge-strength. For the purposes of this paper, these edge-connectivity measures seem too costly to compute and the main technical contribution of this paper is to show that degree-based edge-sampling suffices to approximate large cuts.