A Performance Bound for Model Based Online Reinforcement Learning

Lukas Beckenbach, Stefan Streif

Model based reinforcement learning (RL) refers to an approximate optimal control design for infinite-horizon (IH) problems that aims at approximating the optimal IH controller and associated cost parametrically. In online RL, the training process of the respective approximators is performed along the de facto system trajectory (potentially in addition to offline data). While there exist stability results for online RL, the IH controller performance has been addressed only fragmentary, rarely considering the parametric and error-prone nature of the approximation explicitly even in the model based case. To assess the performance for such a case, this work utilizes a model predictive control framework to mimic an online RL controller. More precisely, the optimization based controller is associated with an online adapted approximate cost which serves as a terminal cost function. The results include a stability and performance estimate statement for the control and training scheme and demonstrate the dependence of the controller's performance bound on the error resulting from parameterized cost approximation.

picture_as_pdf flag

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment