A Novel Scheduling Algorithms for Improving Performance in Heterogeneous Cloud Environment Hadi Yazdanpanah Department of Computer, Bushehr Branch Islamic Azad University Bushehr, Iran Hadiyazdanpanah@outlook.com Seyed Javad Mirabedini, Ali Harounabadi Department of Computer, Central Tehran Branch Islamic Azad University Tehran, Iran J_Mirabedini@iauctb.ac.ir, a.harounabadi@iauctb.ac.ir Abstract— MapReduce has been widely used as a Big Data processing platform, and has become a popular parallel computing framework for large-scale data processing in cloud computing. It is best suited for embarrassingly parallel and data- intensive tasks. Scheduling is one of the most critical aspects of MapReduce. Also three important scheduling issues in MapReduce such as locality, synchronization and fairness exist. One of the most important our objectives in this paper propose a scheduling algorithm for users that can achieves fairness and proper performance in heterogeneous cluster to request resources in a cost effective method. To minimize our costs, this paper used the allocate resources based on service level agreements, and propose a scheduling algorithm for MapReduce platform that is suitable for heterogeneous systems and also can substantially minimize the total completion time of MapReduce job Because of traditional MapReduce schedulers usually is not capable to identifying slow jobs. Using this method can be convincing completion time and overall energy costs can be minimized. Keywords-MapReduce; Scheduling Algorithms; Hadoop; Cloud Computing; Heterogeneous Environment I. INTRODUCTION Nowadays data are becoming larger and larger in every field. Various models of algorithms, including cluster computing, volunteer computing, peer-to-peer computing, grid computing and recently, cloud computing has been proposed in recent years, are available. Cloud Computing is new style of computing which is getting progress constantly. Hence, Requirements for massive data processing has increased because of the emergence of cloud computing as One of the most popular technologies for providing services to users is going up, that this issue has caused, cloud computing systems gain popularity dramatically. Cloud providers rent computing resources (e.g. Amazon flexible cloud computing, Microsoft Azure and so on) according to the request of users and users as much as resources need pay as well, they have many users on their physical infrastructure. In such "pay-per-use", workflow execution cost must be considered during scheduling based on users’ QoS constraints. The main services of cloud computing include of infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS)[1]. Infrastructure as a service, users is requested number of virtual machines for a period of time. Also, the computing resources are provided in the form of virtual machines to users. In other words, Cloud-based Infrastructure-as-a-Service (IaaS) systems usually provide Virtual Machines (VMs) that are subsequently customized by the user. To provide proficient resources, Cloud computing has been pioneered. The main objective of providers earn as much as possible and with minimal investment. Cloud computing environments have the two main actors: cloud providers and cloud users. On the one hand, providers present wide processing resources as large data centers and rent their resources to the users. On the other side, there are users who execute their program with the variable computational load and rent the resources needed to execute their program from providers. However, the two actors will not share information with each other because it makes resource allocation more difficult. For example, providers and users do not want to disclose the details of their work to each other. As a popular programming model in cloud-based data processing environment, MapReduce and Hadoop [2] is Apache’s open source implementation of the MapReduce framework, are widely used both in industry and academic researches. MapReduce [3] is proposed by Google in 2004 and has become a popular parallel computing framework for large-scale data processing since then. It is best suited for processing parallel and data-intensive tasks. It is designed to read large amount of data stored in a distributed file system such as Google File System (GFS) [4], process the data in parallel, aggregate and store the results back to the distributed file system. In a typical MapReduce job, the master divides the input files into multiple map tasks, and then schedules both map tasks and reduce tasks to worker nodes in a cluster to achieve parallel processing. The two major performance metrics in MapReduce are job execution time and cluster throughput. A heterogeneous computing environment is a large-scale distributed environment for data processing and is dependent on some of the application parameters. This environment can be classified into three main categories, such as the hardware, communications layers and software. A computer system includes the hardware and software of two or more different manufacturers. Unlike homogeneous systems, heterogeneous systems run faster some job on a particular node to other nodes. Many cloud applications, largely assumed to be homogeneous environments. Heterogeneous system architecture can use multiple CPU and GPU, to obtain the desired benefits. One of the most important objectives in this paper propose a scheduling algorithm for users that can achieves fairness and International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 10, October 2016 130 https://sites.google.com/site/ijcsis/ ISSN 1947-5500