A Novel Greedy-Step Bellman Optimality Equation for Efficient Value Propagation

Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Efficiently propagating credit to responsible actions is a central and challenging task in reinforcement learning. To accelerate information propagation, this paper presents a new method that bridges a highway that allows unimpeded information to flow across long horizons. The key to our method is a newly proposed Bellman equation, called Greedy-Step Bellman Optimality Equation, through which the high-credit information can fast propagate across a long horizon. We theoretically show that the solution of the new equation is exactly the optimal value function and the corresponding operator converges faster than the classical operator. Besides, it leads to a new multi-step off-policy algorithm, which is capable of safely utilizing any off-policy data collected by the arbitrary policy. Experiments reveal that the proposed method is reliable, easy to implement. Moreover, without employing additional components of Rainbow except Double DQN, our method achieves competitive performance with Rainbow on the benchmark tasks.

Knowledge Graph



Sign up or login to leave a comment