Agent Modelling in Partially Observable Domains Pradeep Varakantham, Rajiv Maheswaran, Milind Tambe University of Southern California {varakant, maheswar, tambe}@usc.edu Abstract Monitoring selectivity is a key challenge faced by agents when modelling other agents(1) — agents cannot continu- ally monitor others due to the computational burden of such monitoring and modelling, but lack of such monitoring and modelling leads to increased uncertainty about the state of other agents. Such monitoring selectivity is also crucially important when agents engage in planning in the presence of action and observation uncertainty. Formally, this paper focuses on an agent that uses a POMDP to plan its ac- tivities, in a multiagent setting, and illustrates the critical nature of the monitoring selectivity challenge in POMDPs. The paper presents heuristics to limit the amount of mon- itoring and modelling of other agents, where the heuris- tics exploit the reward structure and transition probabilities to automatically determine where to curtail such monitor- ing and modelling. We concretely illustrate our techniques in the domain of software personal assistants, and present some initial experimental results illustrating the efficiency of our approach. 1. Introduction Agents in a dynamic, partially observable and collabora- tive multi-agent environment — as in a team setting — must monitor their peers to execute individual and group plans. Monitoring peers is of paramount importance in teams, since team-members rely on each other and work closely on related tasks. A key question is how much monitoring of other agents states is required for effective planning - The Monitoring Selectivity Problem (1). Regardless of the monitoring method, bandwidth and computational limitations prohibit a monitoring agent from monitoring all other agents all the time (Jennings, 1995; Durfee, 1995; Grosz & Kraus, 1996). However, reducing the monitoring of other agents can introduce uncertainty about their precise state, leading to degradation in agent’s performance e.g., as it makes decisions to assist them or to ask for assistance. Monitoring Selectivity Problem basically arises because of three different costs: 1. Cost for observing other agents (at execution time). 2. Cost for modelling other agents. 3. Cost for planning by taking the model of other agent into account. Here we are concentrating on the third aspect of monitor- ing selectivity where we are basically trying to improve the planning efficiency. We address the above aspect of monitoring selectivity problem in the context of POMDPs, illustrating that (i) the monitoring selectivity problem arises when agents must plan activities in environments involving action and obser- vational uncertainty; (ii) techniques to address such mon- itoring selectivity problems in the context of POMDPs. POMDPs are used on account of the observational uncer- tainty present in the domain (uncertainty in obtaining the status of a task) and to reason about the problem of Ad- justable Autonomy(AA). Modelling of other agents is done by reasoning about the state of the other agent using its ob- servations. In this case, modelling basically results in a big- ger POMDP with more states, and observations. This is ba- sically the problem of Monitoring selectivity in POMDPs - being able to deal with more number of states and observa- tions. In this paper, the question of Monitoring Selectivity is investigated in the context of the Software Personal Assis- tants. Task Rescheduling is the specific aspect under con- sideration. Agents monitor the tasks (being done by their users) to complete them with minimum tardiness. Agents help the users with decisions on when and how to reallo- cate the tasks, in case the tasks are not progressing well. Agents interact with their Users to know about their status on the tasks, or to get their decisions at points of high un- certainty (on the status). This paper provides two methods for dealing with the monitoring selectivity problem - Team plan based mod- elling and State and Observation space reduction heuris- tics. Team plan based modelling exploits the fact that mod- elling (required) varies based on the dependencies between the various tasks. Reduction heuristics provide for the de- crease in the number of states and observations in the