Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma

Yuki Usui, Masahiko Ueda

We investigate the repeated prisoner's dilemma game where both players alternately use reinforcement learning to obtain their optimal memory-one strategies. We theoretically solve the joint Bellman optimum equations of reinforcement learning. We find that the Win-stay Lose-shift strategy, the Grim strategy, and the strategy which always defects can form symmetric equilibrium of the mutual reinforcement learning process amongst sixteen deterministic strategies.

Knowledge Graph



Sign up or login to leave a comment