A Novel Fault-tolerant Task Scheduling Algorithm
for Computational Grids
Jairam Naik K
a
, Satyanarayana N
b
a
Research Scholar, Faculty of CSE, JNTU Hyderabad, Andhra Pradesh, India.
b
Professor, Department of CSE, NGITS Nagole, Hyderabad, Andhra Pradesh, India.
jairam.524@gmail.com, nsn1208@gmail.com
Abstract— A computational grid is a hardware and software
infrastructure that provides consistent, dependable, pervasive
and expensive access to high-end computational capabilities in a
multi-institutional virtual organization. Computational grids
provide computing power needed for execution of tasks.
Scheduling the task in computing grid is an important problem.
To select and assign the best resources for task, we need a good
scheduling algorithm in grids. As grids typically consist of
strongly varying and geographically distributed resources,
choosing a fault-tolerant computational resource is an important
issue. The main scheduling strategy of most fault-tolerant
scheduling algorithms depends on the response time and fault
indicator when selecting a resource to execute a task.
In this paper, a scheduling algorithm is proposed to select the
resource, which depends on a new factor called Scheduling
Success indicator (SSI). This factor consists of the response time,
success rate and the predicted Experience of grid resources.
Whenever a grid scheduler has tasks to schedule on grid
resources, it uses the Scheduling Success indicator to generate
the scheduling decisions. The main scheduling strategy of the
Fault-tolerant algorithm is to select resources that have lowest
tendency to fail and having more experience in task execution.
Extensive experiment simulations are conducted to quantify the
performance of the proposed algorithm on GridSim. GridSim is
a Java based discrete-event Grid simulation toolkit. Experiments
have shown that the proposed algorithm can considerably
improve grid performance in terms of throughput, failure
tendency and worth.
Keywords— Resources, Computational Grid, Fault-tolerant,
success rate, Simulation, Failure tendency, Resource experience
I. INTRODUCTION
Grid computing technology of Computer Science
has emerged and evolved over the past years from
the theoretical research to the application
environment. Availability of powerful computers,
proliferation of the Internet Technology and the
high-speed networks as low-cost commodity
components are changing the way we do large-scale
parallel and distributed computing. The interest in
coupling geographically distributed computational
resources is also growing for solving large-scale
problems, leading to what is popularly called the
Grid and peer-to-peer (P2P) computing networks.
These enable sharing, selection and aggregation of
required computational and data resources for
solving large-scale problems in science, engineering,
and commerce.
Complexity of computational grids mainly
originates from decentralized management and
recourse heterogeneity with different security
policies. These factors often lead to an increase
in the probability of resources to fail than
traditional parallel and distributed systems [3]
and strong variations in the grid availability.
Also, as applications grow to use more resources
for longer periods of time, they will inevitably
encounter increasing number of resource failures
[4]. This will affect the execution of the tasks
assigned to the failed resources when failures
occur. So, a fault-tolerant service is important in
grids. Fault-tolerant is an ability of preserving
the delivery of expected services by self, despite
the presence of failures within the grid. The
various forms of failures in grid computing
systems include resource failure, network failure,
and application failure [5]. Providing fault-
tolerant service in a grid environment, while
optimizing resource scheduling and task
execution, is an important issue. In
computational grids, managing the fault is a very
important and challenging problem for grid
application developers [5].
To detect faults and resolving them, grid
applications must have fault-tolerant services.
These services should enable the applications to
continue their computations on the resources of
the grid without terminating the applications in
case of failure. These services also required to
satisfy the minimum levels of quality of service
(QoS) requirements for applications such as the
978-1-4673-2818-0/13/$31.00 ©2013 IEEE