DOI: 10.4018/IJSITA.2017100104 International Journal of Strategic Information Technology and Applications Volume 8 • Issue 4 • October-December 2017 Copyright © 2017, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Dynamic Data Replication Based on Tasks scheduling for Cloud Computing Environment Siham Kouidri, Department of Computer Science, Faculty of Exact and Applied Sciences, University Oran1 Ahmed Ben Bella, Oran, Algeria Belabbas Yagoubi,Department of Computer Science, Faculty of Exact and Applied Sciences, University Oran1 Ahmed Ben Bella, Oran, Algeria ABSTRACT Cloud computing provides IT resources (e.g., CPU, memory, network, storage, etc.) based on virtualization concepts and a pay-as-you-go principle. It comprises an accumulation of inter-related plus virtualized calculating resources which are managed by one or more amalgamated calculating resources. With the development of a computerized scientific workflow, the amount of data is increasing exponentially. Workflow scheduling and data replication have been considered the major challenges in cloud computing. Nevertheless, many researchers focus on scheduling or data replication separately. In this article, a combination of workflow scheduling based on the clustering of data and dynamic data replication strategies, has been introduced together and evaluates several performance metrics using a Cloudsim simulator. The aim of this proposed algorithm is to minimize the completion time and transfer time. The performance of this proposed algorithm has been evaluated using the CloudSim toolkit. KEywoRDS Cloud Computing, Clustering, Dynamic Replication, Scheduling, Scientific Workflow, Virtual Machines 1. INTRoDUCTIoN Due to the development of virtualization and Internet technologies, Cloud Computing has emerged as a new computing platform. Cloud computing provides high performance computing resources and mass storage resources that are used in large scale scientific applications such as high energy physics, bioinformatics, climate modeling (Jang, Kim, Kim, & Lee, 2012). Cloud computing provides one or more consolidated IT resources based on SLA between service providers and service consumers (Goyal & Agrawal, 2013). The data requirements for these scientific applications have been growing at an unprecedented rate in both volume and scale with huge input data sets. In scientific workflow applications data plays an important role. The jobs submitted by the users in these applications require huge input data sets distributed geographically and transferring these large-sized data takes tremendous amount of time. Scheduling and Replication are two well- known techniques to boost the performance of cloud computing. In the literature, many techniques are there, which trying to reduce the job execution time for high performance and good throughput. From these techniques job scheduling and data replication algorithms. When these techniques are implemented in cloud Environment, give different results, and many researchers day by day are 40