Equivalency of Optimality Criteria of Markov Decision Process and Model Predictive Control

Arash Bahari Kordabad, Mario Zanon, Sebastien Gros

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to limit the computational complexity. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the proposed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment