Situational Awareness by Risk-Conscious Skills

Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

Hierarchical Reinforcement Learning has been previously shown to speed up the convergence rate of RL planning algorithms as well as mitigate feature-based model misspecification (Mankowitz et. al. 2016a,b, Bacon 2015). To do so, it utilizes hierarchical abstractions, also known as skills -- a type of temporally extended action (Sutton et. al. 1999) to plan at a higher level, abstracting away from the lower-level details. We incorporate risk sensitivity, also referred to as Situational Awareness (SA), into hierarchical RL for the first time by defining and learning risk aware skills in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our novel Situational Awareness by Risk-Conscious Skills (SARiCoS) algorithm which comes with a theoretical convergence guarantee. We show in a RoboCup soccer domain that the learned risk aware skills exhibit complex human behaviors such as `time-wasting' in a soccer game. In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment