DOI: 10.4018/IJSITA.2017100104
International Journal of Strategic Information Technology and Applications
Volume 8 • Issue 4 • October-December 2017
Copyright © 2017, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Dynamic Data Replication Based
on Tasks scheduling for Cloud
Computing Environment
Siham Kouidri, Department of Computer Science, Faculty of Exact and Applied Sciences, University Oran1 Ahmed Ben
Bella, Oran, Algeria
Belabbas Yagoubi,Department of Computer Science, Faculty of Exact and Applied Sciences, University Oran1 Ahmed
Ben Bella, Oran, Algeria
ABSTRACT
Cloud computing provides IT resources (e.g., CPU, memory, network, storage, etc.) based on
virtualization concepts and a pay-as-you-go principle. It comprises an accumulation of inter-related
plus virtualized calculating resources which are managed by one or more amalgamated calculating
resources. With the development of a computerized scientific workflow, the amount of data is
increasing exponentially. Workflow scheduling and data replication have been considered the major
challenges in cloud computing. Nevertheless, many researchers focus on scheduling or data replication
separately. In this article, a combination of workflow scheduling based on the clustering of data and
dynamic data replication strategies, has been introduced together and evaluates several performance
metrics using a Cloudsim simulator. The aim of this proposed algorithm is to minimize the completion
time and transfer time. The performance of this proposed algorithm has been evaluated using the
CloudSim toolkit.
KEywoRDS
Cloud Computing, Clustering, Dynamic Replication, Scheduling, Scientific Workflow, Virtual Machines
1. INTRoDUCTIoN
Due to the development of virtualization and Internet technologies, Cloud Computing has emerged
as a new computing platform. Cloud computing provides high performance computing resources and
mass storage resources that are used in large scale scientific applications such as high energy physics,
bioinformatics, climate modeling (Jang, Kim, Kim, & Lee, 2012).
Cloud computing provides one or more consolidated IT resources based on SLA between service
providers and service consumers (Goyal & Agrawal, 2013). The data requirements for these scientific
applications have been growing at an unprecedented rate in both volume and scale with huge input
data sets. In scientific workflow applications data plays an important role. The jobs submitted by the
users in these applications require huge input data sets distributed geographically and transferring
these large-sized data takes tremendous amount of time. Scheduling and Replication are two well-
known techniques to boost the performance of cloud computing. In the literature, many techniques
are there, which trying to reduce the job execution time for high performance and good throughput.
From these techniques job scheduling and data replication algorithms. When these techniques are
implemented in cloud Environment, give different results, and many researchers day by day are
40