International Journal of Computer Science Trends and Technology (IJCST) – Volume 5 Issue 2, Mar – Apr 2017 ISSN: 2347-8578 www.ijcstjournal.org Page 53 An Efficient Dynamic Slot Allocation Based On Fairness Consideration for MAPREDUCE Clusters T. P. Simi Smirthiga [1] , P.Sowmiya [2] , C.Vimala [3] , Mrs P.Anantha Prabha [4] U.G Scholar [1], [2] & [3] , Associate Professor [4] Department of Computer Science & Engineering Sri Krishna College of Technology, Kovaipudur, Coimbatore Tamil Nadu -India ABSTRACT Map Reduce is the popular parallel computing paradigm for large-scale data processing in clusters and data centers. However, the slot utilization can be low, especially when Hadoop Fair Scheduler is used, due to the pre-allocation of slots among reduce tasks and map, and the order that map tasks followed by reduce tasks in a typical MapReduce environment. On permitting slots to be dynamically (re)allocated to either reduce tasks or map depending on their actual requirement this problem is solved. The proposal of two types of Dynamic Hadoop Fair Scheduler (DHFS), for two different levels of fairness (i.e., cluster and pool level) improvise the makespan. HFS is a two-level hierarchy, with task slots allocation across “pools” at the top level, and slots allocation among multiple jobs within the pool at the second level. On proposing two types of Dynamic Hadoop Fair Scheduler (DHFS), with the consideration of different levels of fairness (i.e., pool level and cluster- level). The experimental results show that the proposed DHFS can improve the system performance significantly (by 32% - 55% for a single job and 44% - 68% for multiple jobs) while guaranteeing the fairness. Keywords:- Mapreduce, Namenode,Jobtracker, Task Tracker, Datanode I. INTRODUCTION MapReduce has become an important paradigm for parallel data-intensive cluster programming due to its simplicity and flexibility. Essentially, it is a software framework that allows a cluster of computers to process a large set of unstructured or structured data in parallel. MapReduce users are quickly growing. Apache Hadoop is an open source implementation of MapReduce that has been widely adopted in the industrial area. With the rise of cloud computing, it becomes more convenient for IT business to set a cluster of servers in the cloud and launch a batch of MapReduce jobs. Therefore, by using the MapReduce framework there are a large variety of data- intensive applications. In a classic Hadoop system, each MapReduce job is partitioned into small tasks which are distributed and were executed across multiple machines. There are two kinds of tasks, i.e., map tasks and reduce tasks. Each map task applies the same map function to process a block of the input data and produces a intermediate results in the format of key-value pairs. The intermediate data will be partitioned by hash functions and fetched by the corresponding reduce task as their inputs. Once all the intermediate data has been fetched, a reduce task starts to execute and produce the final results. The Hadoop implementation closely resembles the MapReduce framework. A single master node is adopted to manage the distributed slave nodes. The master node communicates with slave nodes with heartbeat messages which consist of status information of slaves. Job scheduling is performed by a centralized job tracker routine in the master node. The scheduler assigns tasks to slave nodes which have free resources and response to heartbeats as well. The resources in each slave node are represented as map/reduce slots. Each slave node has a fixed number of slots, and each map slot only processes one map task at a moment. II. LITERATURE REVIEW Bi-criteria algorithm for a scheduling job uses a new method for building an efficient algorithm for scheduling jobs in a cluster. Here jobs are considered as parallel tasks (PT) which can be scheduled on any number of processors. The main thing is to consider two criteria that are optimized together. These criteria are the RESEARCH ARTICLE OPEN ACCESS