Revisiting Discrete Soft Actor-Critic

Haibin Zhou, Zichuan Lin, Junyou Li, Deheng Ye, Qiang Fu, Wei Yang

We study the adaption of soft actor-critic (SAC)from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at:

Knowledge Graph



Sign up or login to leave a comment