IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010 623 The Environment Value of an Opponent Model Brett J. Borghetti Abstract—We develop an upper bound for the potential perfor- mance improvement of an agent using a best response to a model of an opponent instead of an uninformed game-theoretic equilibrium strategy. We show that the bound is a function of only the domain structure of an adversarial environment and does not depend on the actual actors in the environment. This bounds-finding technique will enable system designers to determine if and what type of opponent models would be profitable in a given adversarial environment. It also gives them a baseline value with which to com- pare performance of instantiated opponent models. We study this method in two domains: selecting intelligence collection priorities for convoy defense and determining the value of predicting enemy decisions in a simplified war game. Index Terms—Equilibrium, game theory, multiagent systems, opponent model. I. I NTRODUCTION S UPPOSE that we are planning to send a convoy through dangerous territory in a hostile area. There is a chance that we will encounter an ambush. We would like to minimize our risk while ensuring that the cargo is delivered. We have two roads to choose from—road A, which is the straightest shortest distance to the destination, and road B, which is significantly longer but still leads to the destination. Because of the time sensitivity of the cargo, the travel time to the destination is im- portant. Based on our estimates, the enemy may have planned up to two ambushes, but we are not sure whether they have planned any on road A, road B, or both. We also know based on a probabilistic analysis of past events that each ambush succeeds with probability p. If we had access to intelligence sources, which could reveal more information about the enemy’s activities, we would be able to plan better our convoy. Fortunately, we do have access to two potential intelligence sources: capability X, which we could use to determine the number of teams the enemy has that it could use for ambushes (0, 1, or 2), and capability Y , which could persistently observe road A and report if and when the enemy sets up an ambush on that road (road B is far too long to get complete information on). Each of these capabilities is fairly accurate (although not perfect). Unfortunately, due to high demand, we have only been authorized to use one of these intelligence capabilities. Our goal is to choose which intelligence capability (X or Y ) we want to use. This paper Manuscript received December 21, 2008; revised July 29, 2009. First pub- lished November 3, 2009; current version published June 16, 2010. The views expressed in this paper are those of the author and do not reflect the official policy or position of the U.S. Air Force, the Department of Defense, or the U.S. Government. This paper was recommended by Associate Editor T. Vasilakos. The author is with the Department of Electrical and Computer Engineer- ing, Air Force Institute of Technology, Dayton, OH 45433 USA (e-mail: brett.borghetti@afit.edu). Digital Object Identifier 10.1109/TSMCB.2009.2033703 describes a computational method for determining answers to such questions. Borrowing language from intelligent agents, we can define the enemy as an agent, the number of ambush teams that they have available as their state, and the locations (road A or B) that they plan to ambush as their action. We define an agent’s policy as a function that maps its states to actions P : S A where S is the set of all possible states, and A is the set of all possible actions. An agent model is a function that predicts something about the agent. One class of models predicts the likelihood of each action that the agent might take. The model takes a subset of the features of a state as input and provides a probability distribution over possible actions, i.e., M : wa where a is a set of possible actions in the state, and w is a probability distribution over n actions such that n i=1 w i =1. Another type of a model predicts the likelihood of the agent being in each state that it could be in, i.e., M : ws where w is a probability distribution over m states s, and m i=1 w i =1. Since intelligence capabilities X and Y both have the po- tential to reveal information about the enemy’s activities, the information they provide was as if it was given by opponent models. Since intelligence capability X provides (a probability distribution over) the number of ambush teams that the enemy has, it predicts the enemy’s state, i.e., X : ws. Y is equivalent to a model that predicts the (probability distribution over) likelihood of an ambush team residing on road A; it (partially) predicts the enemy’s action, i.e., Y : wa. In agent-based literature, the active entities are the agents, and the world in which they perceive and take action is some- times called the environment. Mathematically, the environment is a function that maps states (of the world) and actions (of the agents) to successor states, i.e., Env : S t × A t S t+1 . Note that environments are often nondeterministic: The mapping from states and actions to successor states could be 1083-4419/$26.00 © 2009 IEEE