Efficiently searching for good agent state based policies in Dec-POMDPs

Aditya Mahajan (McGill University)

Decentralized partially observable Markov decision processes (Dec-POMDPs) are becoming increasingly popular in various applications ranging from decentralized control of fleet of autonomous vehicles to that of smart grids. Optimally solving Dec-POMDPs is notoriously hard as is illustrated by various counterexamples including Witsenhausen’s counterexample and Whittle and Rudge counterexample. The complexity of finding best history based policies is NEXP complete. Agent-state based policies have emerged as a popular paradigm to address some of these challenges. In this talk, we review the existing solution approaches to find optimal agent state base policies and present a novel policy search algorithm which has monotonic improvement guarantee and converges to a locally optimal solution. We conclude by presenting experimental results that show that that the proposed algorithm identifies close to optimal policies in various POMDP and Dec-POMDP benchmarks. Joint work with Amit Sinha and Matthieu Geist.

Bio: Aditya Mahajan is Professor of Electrical and Computer Engineering atMcGill University, Montreal, Canada. He is a member of the McGill Center of Intelligent Machines (CIM), Mila – Québec AI Institute, International Laboratory for Learning Systems (ILLS), and Groupe d’études et de recherche en analyse des décisions (GERAD). He is the recipient of the 2015 George Axelby Outstanding Paper Award, the 2016 NSERC Discovery Accelerator Award, the 2014 CDC Best Student Paper Award (as supervisor), and the 2016 NecSys Best Student Paper Award (as supervisor). His principal research interests include decentralized stochastic control, team theory, reinforcement learning, multi-armed bandits and information theory.