This two-part paper considers an uplink massive device communication scenario in which a large number of devices are connected to a base-station (BS), but user traffic is sporadic so that in any given coherence interval, only a subset of users are active. The objective is to quantify the cost of active user detection and channel estimation and to characterize the overall achievable rate of a grant-free two-phase access scheme in which device activity detection and channel estimation are performed jointly using pilot sequences in the first phase and data is transmitted in the second phase. In order to accommodate a large number of simultaneously transmitting devices, this paper studies an asymptotic regime where the BS is equipped with a massive number of antennas. The main contributions of Part I of this paper are as follows. First, we note that as a consequence of having a large pool of potentially active devices but limited coherence time, the pilot sequences cannot all be orthogonal. However, despite the non-orthogonality, this paper shows that in the asymptotic massive multiple-input multiple-output (MIMO) regime, both the missed device detection and the false alarm probabilities for activity detection can always be made to go to zero by utilizing compressed sensing techniques that exploit sparsity in the user activity pattern. Part II of this paper further characterizes the achievable rates using the proposed scheme and quantifies the cost of using non-orthogonal pilot sequences for channel estimation in achievable rates.