Utility-Based Sequential Decision-Making In Evidential Cooperative Multi-Agent Systems Galina Rogova Encompass Consulting Honeoye Falls, NY, U.S.A. rogova@rochester.rr.com Carlos Lollett Computer Science and Engineering Department University at Buffalo Buffalo, NY, U.S.A. clollett@acsu.buffalo.edu Peter Scott Computer Science and Engineering Department University at Buffalo Buffalo, NY, U.S.A. peter@cse.buffalo.edu Abstract - This paper presents a new approach to building utility-based models of decision-making in time-constrained situations with limited resources. A particular hierarchical homogenous multi-agent architecture has been considered. The proposed system combines agents’ beliefs within the framework of evidence theory and after each observation maps the current set of cumulative pignistic probabilities into one of two actions: “defer decision” or “decide hypothesis i ". The system maximizes the expected utility of delayed decisions minus cost. The process of system adaptation to the environment is guided by reinforcement learning. The utilities-from-experts problem is simplified by learning utilities directly from feedback on the quality of the decisions. The results of a case study are presented. Keywords: Decision utility, Reinforcement learning, Distributed systems, Multi-agent systems, Sequential decision-making, Evidence theory. 1 Introduction The goal of the research described in this paper is to investigate the problem of sequential decision making with limited resources in time-constrained situations in cooperative multi-agent systems. Timely action in time- constrained situations is needed in many real-world applications, for example, in medical decision making in the emergency room when a patient’s condition is deteriorating but the optimal treatment decision can be made with complete confidence only after expensive, time-consuming tests. The same considerations obtain in target recognition when additional observations can improve the quality of recognition and help avoid errors but at the same time the cost of delay is very high and there are competing demands on the sensors required for the additional observations. In general, sequential decision making with constrained resources can be considered as a part of resource management in which a decision-maker performs a trade-off between benefits of additional observations and cost of delaying decision and its resulting action. In this research based on our previous studies [1-3], we consider a hierarchical homogeneous multi-agent system in which agents with a common internal structure, including domain knowledge, a common set of hypotheses to be considered, and a common procedure for assigning a level of belief to each hypothesis, are able to extract different features from the environment but are unable to communicate directly with one another. They passively acquire information from observations at discrete times , where is the time of selection of any particular hypothesis and T is the deadline by which a classification decision is required. At each time T t t t ≤ = * * , ,... 2 , 1 * t t each agent produces beliefs in each hypothesis under consideration and transmits these beliefs to the Fusion Center (FC). The FC combines of all the beliefs obtained from the agents up to and including t within the framework of evidence theory [4,5] and produces cumulative pignistic probabilities for each hypothesis. The decision-maker then maps the current set of cumulative pignistic probabilities into one of two actions: “defer decision” or “decide hypothesis ”. i The process of system adaptation to the environment is guided by reinforcement learning. Reinforcement learning is the strategy by which an agent learns 823