In this paper, we propose a network scenario where the baseband processes of the virtual small cells powered solely by energy harvesters and batteries can be opportunistically executed in a grid-connected edge computing server, co-located at the macro base station site. We state the corresponding energy minimization problem and propose multi-agent Reinforcement Learning (RL) to solve it. Distributed Fuzzy Q-Learning and Q-Learning on-line algorithms are tailored for our purposes. Coordination among the multiple agents is achieved by broadcasting system level information to the independent learners. The evaluation of the network performance confirms that coordination via broadcasting may achieve higher system level gains than un-coordinated solutions and cumulative rewards closer to the off-line bounds. Finally, our analysis permits to evaluate the benefits of continuous state/action representation for the learning algorithms in terms of faster convergence, higher cumulative reward and more adaptation to changing environments.