A Universal Criteria Catalog for Evaluation of Heterogeneous Agent Development Artifacts Lars Braubach Alexander Pokahr Winfried Lamersdorf Distributed Systems and Information Systems Computer Science Department, University of Hamburg Vogt-Kölln-Str. 30, D-22527 Hamburg, Germany {braubach|pokahr|lamersdorf}@informatik.uni-hamburg.de ABSTRACT The research discipline of multi-agent systems is character- ized by a high degree of heterogeneity. This heterogeneity leads to a vast amount of options (e.g. different architec- tures and languages) how to employ agent technology but is also one major source of difficulties for its adoption. Peo- ple interested in using multi-agent systems depend on solid survey articles, which clarify and evaluate these different op- tions and explain in which situations which choices should be made. A survey should also propose viable classification means for helping readers to understand which development artifacts broadly exhibit similar properties. To date, in most cases multi-agent system surveys do without classifications and only address one specific type of artifact such as agent languages or tools. Often, only the characteristics of the rep- resentatives are described without evaluating them. In this work a universal criteria catalog will be presented that has been defined abstractly enough for being usable for a wide variety of agent development artifacts. It will be shown how this abstract catalog can be further refined with respect to the chosen area of investigation. In addition to the cata- log its general usage as part of a survey will be explained and a blueprint for survey conduction will be presented. To demonstrate its usefulness cutouts of extensive evaluations, performed in the areas of agent architectures, languages, methodologies, tools and platforms, will be presented. 1. INTRODUCTION The high heterogeneity of the multi-agent systems (MAS) research field leads to many options for realizing agent appli- cations. As many of the available solutions (be it methodolo- gies, platforms or other things) are suitable only in specific application contexts it is very important to have guidelines at hand for the selection of the right option with respect to the given problem [9]. One viable instrument for people interested in agent technology consists in studying surveys and evaluations about specific agent artifacts such as agent architectures or languages. Regrettably, most existing sur- veys do not contain evaluations of the described artifacts and available comparisons of artifacts suffer from ad-hoc classi- Jung, Michel, Ricci & Petta (eds.): AT2AI-6 Working Notes, From Agent Theory to Agent Implementation, 6th Int. Workshop, May 13, 2008, AAMAS 2008, Estoril, Portugal, EU. Not for citation fications resp. selections as well as from non-standardized evaluation criteria. Especially, divergent criteria make it hard to appraise and compare evaluation results, because it remains unclear if the considered criteria are relevant and if there are others not discussed at all. To improve this situation in this paper a universal criteria catalog is presented that has been deduced from established standards and is sufficiently generic for being utilized for the evaluation of arbitrary agent artifacts. The usage of the catalog fosters several important aspects. Firstly, eval- uations of the same artifact type become comparable mak- ing visible the advances in the multi-agent research field, e.g. platform surveys from nowadays and from 5 years ago could show in which areas (e.g. operating ability) progress has been achieved. Secondly, evaluations of different arti- fact types become comparable. This will allow identifying how the state-of-the-art with respect to different artifacts is related and e.g. in which area research should be urged. Thirdly, the criteria catalog has been conceived to be us- able in different scenarios. This only requires conceiving a suitable weighting scheme, which emphasizes the different criteria according to their importance with respect to the overall evaluation objective. Besides the criteria catalog itself it is also sketched what else needs to be done to obtain significant evaluation results. Therefore, a survey blueprint is presented highlighting all important aspects that a survey (including an evaluation part) should possess and also in which order they should roughly be performed. Additionally, an evaluation process is proposed, that describes, how to apply the criteria catalog to a concrete evaluation setting, e.g. considering the artifact type that is to be evaluated. The rest of this paper is structured as follows. In section 2, we give an overview over types of artifacts related to the de- velopment of agent-based systems. Section 3 introduces the universal criteria catalog, which is proposed for the evalua- tion of such artifacts. An abstract evaluation process is de- scribed in section 4. To illustrate the catalog usage, excerpts of evaluations, performed in the area of agent architectures and programming languages, are presented in sections 5 and 6, respectively. In section 7 we discuss the pros and cons of the presented approach. Section 8 concludes the paper with a summary and an outlook. 2. ARTIFACTS FOR MAS DEVELOPMENT Agent technology has been applied to and further devel-