In this paper, we present the use of Model Predictive Control (MPC) based on Reinforcement Learning (RL) to find the optimal policy for a multi-agent battery storage system. A time-varying prediction of the power price and production-demand uncertainty are considered. We focus on optimizing an economic objective cost while avoiding very low or very high state of charge, which can damage the battery. We consider the bounded power provided by the main grid and the constraints on the power input and state of each agent. A parametrized MPC-scheme is used as a function approximator for the deterministic policy gradient method and RL optimizes the closed-loop performance by updating the parameters. Simulation results demonstrate that the proposed method is able to tackle the constraints and deliver the optimal policy.