Using RF signals for wireless sensing has gained increasing attention. However, due to the unwanted multi-path fading in uncontrollable radio environments, the accuracy of RF sensing is limited. Instead of passively adapting to the environment, in this paper, we consider the scenario where an intelligent metasurface is deployed for sensing the existence and locations of 3D objects. By programming its beamformer patterns, the metasurface can provide desirable propagation properties. However, achieving a high sensing accuracy is challenging, since it requires the joint optimization of the beamformer patterns and mapping of the received signals to the sensed outcome. To tackle this challenge, we formulate an optimization problem for minimizing the cross-entropy loss of the sensing outcome, and propose a deep reinforcement learning algorithm to jointly compute the optimal beamformer patterns and the mapping of the received signals. Simulation results verify the effectiveness of the proposed algorithm and show how the sizes of the metasurface and the target space influence the sensing accuracy.