#### CESMA: Centralized Expert Supervises Multi-Agents

##### Alex Tong Lin, Mark J. Debord, Katia Estabridis, Gary Hewer, Stanley Osher

We consider the reinforcement learning problem of training multiple agents in order to maximize a shared reward. In this multi-agent system, each agent seeks to maximize the reward while interacting with other agents, and they may or may not be able to communicate. Typically the agents do not have access to other agent policies and thus each agent observes a non-stationary and partially-observable environment. In order to obtain multi-agents that act in a decentralized manner, we introduce a novel algorithm under the framework of centralized learning, but decentralized execution. This training framework first obtains solutions to a multi-agent problem with a single centralized joint-space learner. This centralized expert is then used to guide imitation learning for independent decentralized multi-agents. This framework has the flexibility to use any reinforcement learning algorithm to obtain the expert as well as any imitation learning algorithm to obtain the decentralized agents. This is in contrast to other multi-agent learning algorithms that, for example, can require more specific structures. We present some theoretical error bounds for our method, and we show that one can obtain decentralized solutions to a multi-agent problem through imitation learning.

arrow_drop_up