Optimizing MapReduce Framework through Joint Scheduling of Overlapping Phases Huanyang Zheng, Ziqi Wan, and Jie Wu Department of Computer and Information Sciences, Temple University, USA Email: {huanyang.zheng, ziqi.wan, jiewu}@temple.edu Abstract—MapReduce includes three phases of map, shuffle, and reduce. Since the map phase is CPU-intensive and the shuffle phase is I/O-intensive, these phases can be conducted in parallel. This paper studies a joint scheduling optimization of overlapping map and shuffle phases to minimize the average job makespan. Challenges come from the dependency relationship between map and shuffle phases, since the shuffle phase may wait to transfer the data emitted by the map phase. A new concept of the strong pair is introduced. Two jobs are defined as a strong pair, if the shuffle and map workloads of one job equal the map and shuffle workloads of the other job, respectively. We prove that, if the entire set of jobs can be decomposed to strong pairs of jobs, then the optimal schedule is to pairwisely execute jobs that can form a strong pair. Following the above intuition, several offline and online scheduling policies are proposed. They first group jobs according to job workloads, and then, execute jobs within each group through a pairwise manner. Real data-driven experiments validate the efficiency and effectiveness of the proposed policies. Index Terms—MapReduce framework, map and shuffle phases, joint scheduling, makespan optimization. I. I NTRODUCTION MapReduce [1] is a well-known programming framework used to process the ever-growing amount of data collected by modern instruments, such as Large Hadron Collider and next-generation gene sequencers. Although MapReduce has been widely adopted in a number of data centers, more improvements are still needed to meet the huge demands of big data computing. In the current MapReduce framework, each job consists of three dependent phases: map, shuffle, and reduce. The map and reduce phases typically deal with a large amount of data computations, while the shuffle phase handles the data transfer among different MapReduce workers. In terms of the resource demand, the map and reduce phases are CPU-intensive, while the shuffle phase is I/O-intensive. Currently, most state-of-the-art research on MapReduce optimizations focuses on the map and reduce phases. However, the shuffle phase also plays an important role in transferring the data from map workers to reduce workers. It has a significant impact on the average job makespan, especially when the data is big. Moreover, Chen et al. [2] reported that jobs processed by the Facebook MapReduce cluster are shuffle-heavy. Consequently, this paper studies a joint schedul- ing optimization of map and shuffle phases to minimize the average job makespan (the time span from job arrival to shuffle phase completion). The reduce phase is not jointly optimized, since its workload is relatively light. According to [3], only 7% of jobs in a production MapReduce cluster are reduce-heavy. Time Time 0% 100% 0% 100% J2 J2 J1 J1 Map CPU utilization Shuffle I/O utilization 2 3 4 50% 0 (a) Schedule one. Time Time 0% 100% 0% 100% J2 J2 J1 J1 Map CPU utilization Shuffle I/O utilization 2 3 1 0 (b) Schedule two. Fig. 1. An example for the joint scheduling of overlapping phases. Our key observation is that the map and shuffle phases have different resource demands. Since the map phase is CPU-intensive and the shuffle phase is I/O-intensive, they can potentially be conducted in parallel to minimize the average job makespan. The key challenge comes from the fact that the map and shuffle phases cannot be fully parallelized due to their dependency relationship. The shuffle phase of a job must start later than its map phase, and cannot finish earlier than its map phase. This is because the shuffle phase may wait to transfer the data emitted by the map phase. An example includes the classic application of the WordCount [4], in which the map workers emit key-value pairs at a certain rate to be shuffled to the reduce workers. If the map workload of a job is larger than its shuffle workload, then the I/O resource may be underutilized, leading to a non-optimal job schedule. To illustrate the above motivation more clearly, an example is shown in Fig. 1, which involves two jobs of 1 and 2 . 1 is shuffle-heavy and 2 is map-heavy. Assuming that the resources are fully utilized, the map and shuffle phases of 1 take 1 and 2 time slots, respectively. The resource demand of 2 is the opposite of that of 1 (1 time slot for the shuffle phase and 2 time slots for the map phase). As shown in Fig. 1(a), schedule one executes 2 first, leading to an underutilization of the I/O resource. This is because 2 ’s shuffle phase needs to wait to transfer the data emitted by its map phase (suppose a constant data emission rate). Consequently, schedule one takes 4 time slots to finish all the jobs. As shown in Fig. 1(b), schedule two is a better scheme. It executes 1 first, and only takes 3 time slots to finish all the jobs. It can be seen that, in order to maximally utilize the I/O resource, the shuffle-heavy job should be executed earlier than the map-heavy job.