We prove a lower estimate on the increase in entropy when two copies of a conditional random variable $X | Y$, with $X$ supported on $\mathbb{Z}_q=\{0,1,\dots,q-1\}$ for prime $q$, are summed modulo $q$. Specifically, given two i.i.d copies $(X_1,Y_1)$ and $(X_2,Y_2)$ of a pair of random variables $(X,Y)$, with $X$ taking values in $\mathbb{Z}_q$, we show \[ H(X_1 + X_2 \mid Y_1, Y_2) - H(X|Y) \ge \alpha(q) \cdot H(X|Y) (1-H(X|Y)) \] for some $\alpha(q) > 0$, where $H(\cdot)$ is the normalized (by factor $\log_2 q$) entropy. Our motivation is an effective analysis of the finite-length behavior of polar codes, and the assumption of $q$ being prime is necessary. For $X$ supported on infinite groups without a finite subgroup and no conditioning, a sumset inequality for the absolute increase in (unnormalized) entropy was shown by Tao (2010). We use our sumset inequality to analyze Ar{\i}kan's construction of polar codes and prove that for any $q$-ary source $X$, where $q$ is any fixed prime, and any $\epsilon > 0$, polar codes allow {\em efficient} data compression of $N$ i.i.d. copies of $X$ into $(H(X)+\epsilon)N$ $q$-ary symbols, as soon as $N$ is polynomially large in $1/\epsilon$. We can get capacity-achieving source codes with similar guarantees for composite alphabets, by factoring $q$ into primes and combining different polar codes for each prime in factorization. A consequence of our result for noisy channel coding is that for {\em all} discrete memoryless channels, there are explicit codes enabling reliable communication within $\epsilon > 0$ of the symmetric Shannon capacity for a block length and decoding complexity bounded by a polynomial in $1/\epsilon$. The result was previously shown for the special case of binary input channels (Guruswami-Xia '13 and Hassani-Alishahi-Urbanke '13), and this work extends the result to channels over any alphabet.