We consider an underlay cognitive radio network where the secondary user (SU) harvests energy from the environment. We consider a slotted-mode of operation where each slot of SU is used for either energy harvesting or data transmission. Considering block fading with memory, we model the energy arrival and fading processes as a stationary Markov process of first order. We propose a harvest-or-transmit policy for the SU along with optimal transmit powers that maximize its expected throughput under three different settings. First, we consider a learning-theoretic approach where we do not assume any apriori knowledge about the underlying Markov processes. In this case, we obtain an online policy using Q-learning. Then, we assume that the full statistical knowledge of the governing Markov process is known apriori. Under this assumption, we obtain an optimal online policy using infinite horizon stochastic dynamic programming. Finally, we obtain an optimal offline policy using the generalized Benders decomposition algorithm. The offline policy assumes that for a given time deadline, the energy arrivals and channel states are known in advance at all the transmitters. Finally, we compare all policies and study the effects of various system parameters on the system performance.