Hybrid Adversarial Inverse Reinforcement Learning

Mingqi Yuan, Man-On Pun, Yi Chen, Qi Cao

In this paper, we investigate the problem of the inverse reinforcement learning (IRL), especially the beyond-demonstrator (BD) IRL. The BD-IRL aims to not only imitate the expert policy but also extrapolate BD policy based on finite demonstrations of the expert. Currently, most of the BD-IRL algorithms are two-stage, which first infer a reward function then learn the policy via reinforcement learning (RL). Because of the two separate procedures, the two-stage algorithms have high computation complexity and lack robustness. To overcome these flaw, we propose a BD-IRL framework entitled hybrid adversarial inverse reinforcement learning (HAIRL), which successfully integrates the imitation and exploration into one procedure. The simulation results show that the HAIRL is more efficient and robust when compared with other similar state-of-the-art (SOTA) algorithms.

Knowledge Graph



Sign up or login to leave a comment