Resource Management and Adaptive Replication for Fault-Tolerant MAS Sylvain Ductor 1 , Zahia Guessoum 12 , and Mikal Ziane 13 1 LIP6 - Universit´ e Pierre et Marie Curie (Paris 6) 104 avenue du President Kennedy 75016 Paris, France 2 MODECO-CReSTIC - IUT de Reims, 51687 Reims Cedex 2, France 3 Universit´ e Paris Descartes, Paris, France Abstract. Distributed cooperative applications are now increasingly being designed as MAS. Such applications may be massive, open and very dynamic: new agents can join or leave, they can change roles, strate- gies, etc. This characteristics create new challenges to the traditional approaches of fault-tolerance. In this paper, we focus on a replication- based preventive approach. The aim is to dynamically and automatically adapt the agent replication strategy (e.g. number of replicas and their location), in order to maximize the MAS reliability. We describe a nego- tiation protocol, supporting adaptive replication. This protocol provides a distributed solution that uses local decision but guarantees global MAS performances. We report on experimental results using a first implemen- tation of the protocol. 1 Introduction Multi-Agent Systems (MAS) have generated lots of excitement in recent years because of their promise as a new paradigm for conceptualizing, designing, and implementing software systems, ranging from manufacturing to process control, air traffic control, and information management. They are particularly attrac- tive for creating software that operates in distributed and open environments, such as the Internet. Being both decentralized and self-organized, these systems consist of autonomous entities called agents that are designed to solve tasks by cooperating with each other. These systems thus suffer from all the problems associated with building traditional distributed systems as well as the addi- tional difficulties that arise from having flexible and sophisticated interactions between autonomous and adaptive components. It means that MAS are non- deterministic, and a specific behavior is hard to guarantee, especially in fault situations. Since some changes are unpredictable, there exists no generic way of describing the global state of the MAS. A fault-tolerant infrastructure that could detect and adapt to these failures to provide continuity of processing is thus crucial to MAS. To build a fault-tolerant infrastructure, several projects have used replication mechanisms [7], [13]. Replication of data and/or computation is an effective way to achieve fault-tolerance in distributed systems. However, replicating every