To appear in the Procs of the Second International Workshop on Mobile Agents, MA’98. Germany. September, 1998. Lecture Notes in Computer Science xxxx (http://www.springer.de/comp/lncs/index.html). Copyright © Springer-Verlag. 1/12 An Approach for Providing Mobile Agent Fault Tolerance Flávio M. Assis Silva 1 , Radu Popescu-Zeletin Technical University Berlin/GMD FOKUS Kaiserin-Augusta Allee 31, 10589 Berlin, Germany {ﬂavio, zeletin}@fokus.gmd.de Abstract. This paper presents a fault-tolerance protocol for mobile agent execu- tions that tolerates long-term failures of agencies. If the agency where an agent execution is being performed fails for a long-time, the execution can be recov- ered and continue at another agency. This is not only important for avoiding a mobile agent execution to become blocked, but it also contributes for enforcing the autonomy of organizations in an open environment emitting mobile agents to execute applications that cross the boundary of autonomous organizations. The protocol presented in this paper is based on mobile agent replication and is a var- iation of the protocol described in [6]. Our protocol differs from the work in [6] mainly in the sense that an agent can execute more than a single atomic transac- tion at an agency; it integrates distributed storage of recovery information; and it supports partial recovery of the activity carried out at an agency. The motivation of this work is on building a support for the execution of open nested transac- tions with a set of mobile agents. 1 Introduction Supporting reliable mobile agents executions is an important functionality to be present at a mobile agent execution infrastructure. Once issued by its user, a mobile agent (or simply agent) should be able to execute its activity and eventually provide the results of its execution, independently of failures. In particular, a mobile agent exe- cution should be resilient to long lasting failures of agencies (the logical ”place“ where mobile agents execute), in order to avoid that the execution becomes blocked [1][5][6]. If the agency where an agent is executing fails for a long time, a new copy of the agent should be activated in another agency to recover and continue the activity of the failed agent. Providing such an agent fault tolerance allows an agent-based execution to explore alternatives and to achieve its end as fast as possible. Furthermore, in this way the executions of mobile agents in an environment of autonomous systems become more independent of particular policies applied at the agencies where mobile agents execute, thereby contributing to enforce the autonomy of organizations emitting 1 The work of this author is partially supported by CNPq (Conselho Nacional de Desenvolvi- mento Científico e Tecnológico), Brazil.