Scalable algorithms for sequential decision making under uncertainty in multiagent systems
Sonu, Ekhlas Shaikh
MetadataShow full item record
Decision making or planning is a crucial part of AI research. In recent times, deploying autonomous agents – such as search and rescue robots, autonomous unmanned vehicles, planetary rovers, and many others – has been proposed as a feasible approach in many real-world scenarios where sending humans is deemed either too risky or too expensive. Many times, these agents are required to operate under uncertainty, which may arise either due to the dynamics of the environment or due to the inaccuracies inherent in the sensors and actuators of the autonomous agent. In order to accomplish its goal most efficiently, the autonomous agent must compute an optimal plan for itself. Decision making in partially observable stochastic settings is formalized by partially observable Markov decision processes (POMDPs). Interactive partially observable Markov decision processes (I-POMDPs) generalize POMDPs to multiagent settings where the goal is to compute the optimal policy for an individual agent operating in the presence of other agents whose goals and preferences may not align with that of the subject agent. The solution to the problem of multiagent decision making under uncertainty suffers from several sources of intractability. These sources of intractability arise due to the uncertainty inherentin the environment, such as the stochastic state transitions, noisy or partial observations, and uncertainty about the goals, capabilities, and beliefs of the other interacting agents. While POMDP allows an agent to act rationally in single agent settings and most POMDP approximation algorithms are generalizable to multiagent contexts such as I-POMDPs, additional effort must be made to tackle problems specific to multiagent settings. In my dissertation, I propose several techniques that target various sources of intractability that plague multiagent decision making problems formalized by I-POMDPs. The first approach is a simple enhancement to existing POMDP algorithms that limits the observation space in certain contexts for quicker solution. The next approach is a generalized policy iteration technique that restricts the exponential growth of the model space, thereby speeding up the computation of a locally optimal policy. Third, I propose an online bimodal approach that in which the agent behaves as a POMDP treating other agents as a static noise under higher uncertainty but once it has received sufficient information regarding the current physical state of the environment, switches to full I-POMDP mode. This asymptotically leads to better runtime. Finally, I target the problem of exponentially growing complexity of I-POMDP solution with the number of interacting agents. This source of intractability is overcome by exploiting commonly found problem structures such as anonymity and context-specific independence. Exploiting these problem structures enables solution of problems involving thousands of interacting agents in under six hours.