I. J. Computer Network and Information Security, 2012, 5, 29-38
Published Online June 2012 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijcnis.2012.05.04
Copyright © 2012 MECS I.J. Computer Network and Information Security, 2012, 5, 29-38
Evaluating Overheads of Integrated Multilevel
Checkpointing Algorithms in Cloud Computing
Environment
Dilbag Singh, Jaswinder Singh, Amit Chhabra
Dept. of Computer Science & Engineering, Guru Nanak Dev University Amritsar, Punjab, 143001, India
Dggill2@gmail.com, chhabra.amit78@gmail.com, jaswindersingh@yahoo.com
Abstract — This paper presents a methodology for
providing high availability to the demands of cloud's
clients. To attain this objective, failover stratagems for
cloud computing using integrated checkpointing
algorithms are purposed in this paper. Purposed strategy
integrate checkpointing feature with load balancing
algorithms and also make multilevel checkpoint to
decrease checkpointing overheads. For implementation of
purposed failover strategies, a cloud simulation
environment is developed, which has the ability to
provide high availability to clients in case of
failure/recovery of service nodes. \\The primary objective
of this research work is to improve the checkpoint
efficiency and prevent checkpointing from becoming the
bottleneck of cloud data centers. In order to find an
efficient checkpoint interval, checkpointing overheads
has also considered in this paper. By varying rerun time
of checkpoints comparison tables are made which can be
used to find optimal checkpoint interval.
The purposed failover strategy will work on
application layer and provide highly availability for
Platform as a Service (PaaS) feature of cloud computing.
Index Terms — Failover, Load balancing, Node-
recovery, Multilevel checkpointing, Restartation
I. I NTRODUCTION
Cloud computing [1], [2], [3] is currently emerging as
a powerful way to transform the IT industry to build and
deploy custom applications. In cloud environment jobs
keep on arriving to the data centers for execution and
nodes will be allocated to the jobs for their execution as
per their requirements and successfully executed jobs will
leave the nodes. In this scenario it may possible that some
nodes will become inactive while executing threads due
to some failure. So there is need of efficient failover
strategy for handling failures as it may cause restartation
of entire work, whether some threads of the job has been
successfully done on other nodes. In case of node failure,
that means, the node is no longer accessible to service
any demand of clients, the cloud must migrate jobs to the
other node.
A checkpoint is a local state of a job saved on stable
storage. By periodically executing the checkpointing, one
can save the status of a process at consistent intervals
[17], [18]. If there is a failure, one may resume
computation from the earlier checkpoints, thereby,
avoiding restating execution from the beginning. The
process of restarting computation by rolling back to a
consistent state is called rollback recovery. In cloud
computing environment, since the nodes in the data
centers do not share memory [19], therefore it is required
to transfer the load of failed node to other nodes in case
of any sort of failure.
In this paper, checkpoints are integrated with load
balancing algorithms for data centers (cloud computing
infrastructure) has been considered, taking into account
the several constraints such as handling infrastructure
sharing, availability, failover and prominence on
customer service. These issues are addressed by
proposing a smart failover strategy which will provide
high availability to the requests of the clients. New cloud
simulation environment has been purposed in this paper,
which has the ability to keep all the nodes busy for
achieving load balancing and also execute checkpoints
for achieving failover successfully.
An integrated checkpointing algorithm implements in
parallel with the essential computation. Therefore, the
overheads presented due to checkpointing should need to
be reduced. Checkpointing should enable a CSP to
provide high availability to the requests of the clients in
case of failure, which demands frequent checkpointing
and therefore significant overheads will be introduced. So
it becomes more critical to set checkpointing rerun time.
Multilevel checkpoints [9], [10], [11], [12], [13], [14],
[15] are used in this research work for decreasing the
overheads of checkpoints.
A. Parameters and Metrics used in this paper
TABLE I. Parameters and Metrics used in this paper
Parameter name Meaning
C Checkpoint overhead
L Checkpoint Latency
R
Time required for job
migration
t
Time spent on
computation
t1 No. of time C runs
t2 No. ot time R occurs
r Chcekpointing ratio