Diagnosing a Team of Agents: Scaling-Up Meir Kalech and Gal A. Kaminka The MAVERICK Group Computer Science Department Bar Ilan University, Israel {kalechm, galk}@cs.biu.ac.il ABSTRACT Agents in a team must be in agreement. Unfortunately, they may come to disagree due to sensing uncertainty, communication fail- ures, etc. Once a disagreement occurs we should detect the dis- agreement and diagnose it. Unfortunately, current diagnosis tech- niques do not scale well with the number of agents, as they have high communication and computation complexity. We suggest three techniques to reduce this complexity: (i) reducing the amount of diagnostic reasoning by sending targeted queries; (ii) using light- weight behavior recognition to recognize which beliefs of the agents might be in conﬂict; and (iii) grouping the agents according to their role and behavior and then diagnosing the groups based on repre- sentative agents. We examine these techniques in large-scale teams, in two domains, and show that combining the techniques produces a diagnosis process which is highly scalable in both communication and computation. 1. INTRODUCTION Agents in a team must be in agreement as to their goals, plans and at least some of their beliefs [2, 6, 12]. Unfortunately, they may come to disagree due to sensing differences, ambiguity in sensing, communication failures, etc. When this occurs, and given that it is unknown who is correct, a process of diagnosis is needed to deter- mine the sub-set of beliefs that are at the root of the disagreement. A diagnosis process monitors the agents in order to identify which agents are in disagreement and about what they disagree, so that they can negotiate and argue, to resolve the disagreements [6, 10]. We refer to this kind of diagnosis as social diagnosis, since it fo- cuses on ﬁnding causes for inter-agent failures, i.e., failures to maintain relationships between agents in a team. Social diagno- sis stands in contrast to intra-agent diagnosis, which focuses on determining the causes for component failures within agents. Unfortunately, previous social diagnosis methods do not address large-scale teams, in which both communications and runtime must be tightly managed. Some reduce communication, at the expense of exponential run-times [8]. Others rely on fault models and ex- ceptions (e.g., [7]), which explode combinatorially as the number of agent relations grow). Previous work on large-scale systems did Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. AAMAS’05, July 25-29, 2005, Utrecht, Netherlands. Copyright 2005 ACM 1-59593-094-9/05/0007 ...$5.00. not address social diagnosis, instead focusing on fault detection [9], non-social diagnosis [5, 11], or coordination [4]. We seek to enable social diagnosis in large-scale teams of behavior- based agents. We ﬁrst develop techniques which use communica- tions earlier in the diagnosis process (compared to previous work), in an attempt to stave off both the run-time associated with gen- eration of diagnostic hypotheses, as well as later communications. These techniques include: (i) using initial queries to alleviate diag- nostic reasoning (behavior querying); (ii) using communications in light-weight behavior recognition to focus on relevant beliefs These “communicate early” techniques enable a third method (grouping) in which the diagnosed agents are divided into groups based on their selected behavior and their role, such that all mem- bers of a group are in agreement, and at least one disagreement exists between any two groups. Then, only representative agents of each group are diagnosed, and the results used for others in their group. We empirically examine these techniques in two domains through hundreds of tests, measuring number of messages, and reasoning runtime. We ﬁnd that behavior querying reduces both runtime and communications. However, the shared beliefs technique does not scale well. Moreover, when combined, these techniques do not reduce communications nor runtime. Surprisingly, however, the grouping method (which is enabled by this disappointing combina- tion), results in a diagnosis process which is highly scalable in both communication and computation. 2. RELATED WORK Frohlich et al. [5] and Roos et al. [11] present diagnosis methods in distributed systems, in which a spatially distributed system is divided into regions, each under the responsibility of a diagnosing agent. However, neither work has addressed social diagnosis, nor disagreements. Horling [7] uses a causal graph-based model of pre-deﬁned fail- ures and diagnoses to detect and respond to multi-agent failures. When a fault is detected, it causes activation of diagnosis results as appropriate. This approach may face difﬁculties in large teams, since the number of possible social faults can grow combinatorially large. Kalech and Kaminka [8] focus on diagnosis of disagreements between agents. They show that one can reduce communications by centralizing the diagnosis, so all the agents may send their infor- mation to a single pre-deﬁned agent who compares between these beliefs. Moreover, they show that further reductions in communica- tions, based on using inference of other agents beliefs, is exponen- tial in run time. However, in teams where the number of agents is scaled-up, such computation and communication is unacceptable. A related area of work deals with failure detection, rather than