Model-based controlled learning of MDP policies with an application to lost-sales inventory control

Willem van Jaarsveld

Recent literature established that neural networks can represent good MDP policies across a range of stochastic dynamic models in supply chain and logistics. To overcome limitations of the model-free algorithms typically employed to learn/find such neural network policies, a model-based algorithm is proposed that incorporates variance reduction techniques. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while also outperforming heuristic benchmarks. The algorithm may be an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics.

Knowledge Graph



Sign up or login to leave a comment