A Generic Framework for Mobile Agent’s Fault Tolerance
Bassey E. Isong
Department of Computer Science and Info. Systems,
University of Venda, Private Bag X5050,
Thohoyandou 0950, South Africa.
bassey.isong@univen.ac.za
Obeten O. Ekabua
Department of Computer Science and Info. Systems,
University of Venda, Private Bag X5050,
Thohoyandou 0950, South Africa.
Obeten.ekabua@univen.ac.za
Abstract—Mobile agent’s execution are prone to failures
originating from bad communication, security attacks, agent
server crashes, system resources unavailability, network
congestion or even deadlock situations. In such events, mobile
agents either get lost or damaged (partially or totally) during
execution. Making mobile agents fault tolerant is a measure
taken to increase the dependability and reliability of agent-
based application. Many approaches have been proposed but
majority of the existing mobile agent’s fault tolerance
implementations are designed to either tolerate one of the fault
classes or two (such as communication, crash and agent
software failure) but not all in any situation. This perhaps,
makes it impossible to detect and recover from failures of all
types. In this paper, based on the analysis of existing fault
tolerance approaches, we proposed a generic fault tolerance
framework that consists of a monitoring, planning and
recovery process execution phases that can help tolerate
failures of all type. The framework is validated using existing
implementations approaches.
Keywords: Mobile Agents, Reliability, Fault Tolerance,
Framework, Checkpointing, Replication
I. INTRODUCTION
Mobile agents are of paramount interest in recent
distributed computing trends in both academia and industrial
fields. Mobile agents are encapsulated pieces of software
containing code and data that are able to migrate from one
host to another and perform certain task autonomously [3].
They operate in a distributed computing environment
consisting of heterogeneous devices and platforms. It is a
technology tends to shift computation towards the data rather
than data to the computation [1]. These distinct
characteristics made them more flexible in deployment
which in turn makes the design, implementation, and
maintenance of distributed systems a very easy task [2].
Mobile agent’s technology has been greatly demonstrated in
many applications domain such as in network management,
telecommunications, e-commerce, information retrieval,
mobile/pervasive computing, artificial intelligence, workflow
management and internet computing, etc [5].
Research activities on mobile agent technology and its
application in recent years have gained considerable
momentum but the issues of reliability is still of great
concern. Mobile agent’s like any other software systems are
not isolated from failure especially in the environments they
operates. The exponential growth of distributed
heterogeneous environments such as the Internet inherently
exposes mobile agent’s execution to adverse condition [3],
[4]. Mobile agents may encounter traditional errors that
specifically emerge during migration request failure,
communication exceptions or security violation [6]. To
operate despite these failures and for mobile agent’s
technology to gain solid grounds at the heart of our today’s
industrial applications, they have to be made reliable enough
through fault tolerance. Fault tolerance aims to provide
reliable execution of agents or resume service in the face of
system failure [2]. In order that fault tolerance in mobile
agents accomplished its developmental goals, a reliable
execution of mobile agents must adhere to two execution
properties; non-blocking and exactly-once execution [7].
Today, several mobile agents’ fault tolerance approaches in
variety of mobile agent platforms exist. These approaches
employed different mechanisms to provide reliable mobile
agents’ execution especially in the failure detection and
recovery aspect. Most of the recent approaches are
optimizations based on existing mechanisms, some are
hybrid-based, while some are exception handling based [8],
[9], [10]. For instance, recent fault tolerance approaches
mostly rely basically on replication but each approach
introduces an optimization to the replication process so as to
either gain performance or lower cost of replication.
In spite of the numerous approaches, fault tolerance in
mobile agent still faced unattended challenges that have
impeded its full realization. The fact is that most of the
existing mobile agent’s fault tolerances implementations are
designed to either tolerate one of the failures (such as
communication, crash and agent software failure) or at least
two but not all in any situation. This perhaps, makes it
difficult if not impossible for mobile agents to detect and
recover from failures of all types. This particularly calls for a
generic fault tolerance model. In this paper, the authors,
based on the analysis of existing fault tolerance approaches
proposed a generic fault tolerance framework that consists of
a monitoring, planning and recovery process execution
phases which can tolerate failures of all type. The framework
is validated using existing implementations approaches.
II. MOBILE AGENT’S FAULT TOLERANCE
The increasing demand for better system performance
and dependability of software components are threatened by
faults which in turn deteriorate system reliability. Faults
bring the normal execution state of a system into error state,
which in turn results in system failure [2]. Mobile agents are
not secluded from operating in abnormal situations. They
have a certain level of exposure to fault since they work in
distributed environment over the network and make their
___________________________________
978-1-4577-0174-0/11/$26.00 ©2011 IEEE
56