To appear in the Procs of the Second International Workshop on Mobile Agents,
MA’98. Germany. September, 1998. Lecture Notes in Computer Science xxxx
(http://www.springer.de/comp/lncs/index.html). Copyright © Springer-Verlag.
1/12
An Approach for Providing
Mobile Agent Fault Tolerance
Flávio M. Assis Silva
1
, Radu Popescu-Zeletin
Technical University Berlin/GMD FOKUS
Kaiserin-Augusta Allee 31, 10589 Berlin, Germany
{flavio, zeletin}@fokus.gmd.de
Abstract. This paper presents a fault-tolerance protocol for mobile agent execu-
tions that tolerates long-term failures of agencies. If the agency where an agent
execution is being performed fails for a long-time, the execution can be recov-
ered and continue at another agency. This is not only important for avoiding a
mobile agent execution to become blocked, but it also contributes for enforcing
the autonomy of organizations in an open environment emitting mobile agents to
execute applications that cross the boundary of autonomous organizations. The
protocol presented in this paper is based on mobile agent replication and is a var-
iation of the protocol described in [6]. Our protocol differs from the work in [6]
mainly in the sense that an agent can execute more than a single atomic transac-
tion at an agency; it integrates distributed storage of recovery information; and it
supports partial recovery of the activity carried out at an agency. The motivation
of this work is on building a support for the execution of open nested transac-
tions with a set of mobile agents.
1 Introduction
Supporting reliable mobile agents executions is an important functionality to be
present at a mobile agent execution infrastructure. Once issued by its user, a mobile
agent (or simply agent) should be able to execute its activity and eventually provide
the results of its execution, independently of failures. In particular, a mobile agent exe-
cution should be resilient to long lasting failures of agencies (the logical ”place“ where
mobile agents execute), in order to avoid that the execution becomes blocked [1][5][6].
If the agency where an agent is executing fails for a long time, a new copy of the agent
should be activated in another agency to recover and continue the activity of the failed
agent. Providing such an agent fault tolerance allows an agent-based execution to
explore alternatives and to achieve its end as fast as possible. Furthermore, in this way
the executions of mobile agents in an environment of autonomous systems become
more independent of particular policies applied at the agencies where mobile agents
execute, thereby contributing to enforce the autonomy of organizations emitting
1
The work of this author is partially supported by CNPq (Conselho Nacional de Desenvolvi-
mento Científico e Tecnológico), Brazil.