When agents are swarmed to carry out a mission, there is often a sudden failure of some of the agents observed from the command base. It is generally difficult to distinguish whether the failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) solely by the communication between the command base and the concerning agent. By making a collision to the agent by another, we would be able to distinguish which hypothesis is likely: For $h_a$, we expect to detect corresponding displacements while for $h_a$ we do not. Such swarm strategies to grasp the situation are preferably to be generated autonomously by artificial intelligence (AI). Preferable actions ($e.g.$, the collision) for the distinction would be those maximizing the difference between the expected behaviors for each hypothesis, as a value function. Such actions exist, however, only very sparsely in the whole possibilities, for which the conventional search based on gradient methods does not make sense. Instead, we have successfully applied the reinforcement learning technique, achieving the maximization of such a sparse value function. The machine learning actually concluded autonomously the colliding action to distinguish the hypothesises. Getting recognized an agent with actuator error by the action, the agents behave as if other ones want to assist the malfunctioning one to achieve a given mission.