Optimizing MapReduce Framework through Joint Scheduling of Overlapping Phases Huanyang Zheng, Ziqi Wan, and Jie Wu Department of Computer and Information Sciences, Temple University, USA Email: {huanyang.zheng, ziqi.wan, jiewu}@temple.edu Abstract—MapReduce includes three phases of map, shufﬂe, and reduce. Since the map phase is CPU-intensive and the shufﬂe phase is I/O-intensive, these phases can be conducted in parallel. This paper studies a joint scheduling optimization of overlapping map and shufﬂe phases to minimize the average job makespan. Challenges come from the dependency relationship between map and shufﬂe phases, since the shufﬂe phase may wait to transfer the data emitted by the map phase. A new concept of the strong pair is introduced. Two jobs are deﬁned as a strong pair, if the shufﬂe and map workloads of one job equal the map and shufﬂe workloads of the other job, respectively. We prove that, if the entire set of jobs can be decomposed to strong pairs of jobs, then the optimal schedule is to pairwisely execute jobs that can form a strong pair. Following the above intuition, several ofﬂine and online scheduling policies are proposed. They ﬁrst group jobs according to job workloads, and then, execute jobs within each group through a pairwise manner. Real data-driven experiments validate the efﬁciency and effectiveness of the proposed policies. Index Terms—MapReduce framework, map and shufﬂe phases, joint scheduling, makespan optimization. I. I NTRODUCTION MapReduce [1] is a well-known programming framework used to process the ever-growing amount of data collected by modern instruments, such as Large Hadron Collider and next-generation gene sequencers. Although MapReduce has been widely adopted in a number of data centers, more improvements are still needed to meet the huge demands of big data computing. In the current MapReduce framework, each job consists of three dependent phases: map, shufﬂe, and reduce. The map and reduce phases typically deal with a large amount of data computations, while the shufﬂe phase handles the data transfer among different MapReduce workers. In terms of the resource demand, the map and reduce phases are CPU-intensive, while the shufﬂe phase is I/O-intensive. Currently, most state-of-the-art research on MapReduce optimizations focuses on the map and reduce phases. However, the shufﬂe phase also plays an important role in transferring the data from map workers to reduce workers. It has a signiﬁcant impact on the average job makespan, especially when the data is big. Moreover, Chen et al. [2] reported that jobs processed by the Facebook MapReduce cluster are shufﬂe-heavy. Consequently, this paper studies a joint schedul- ing optimization of map and shufﬂe phases to minimize the average job makespan (the time span from job arrival to shufﬂe phase completion). The reduce phase is not jointly optimized, since its workload is relatively light. According to [3], only 7% of jobs in a production MapReduce cluster are reduce-heavy. Time Time 0% 100% 0% 100% J2 J2 J1 J1 Map CPU utilization Shuffle I/O utilization 2 3 4 50% 0 (a) Schedule one. Time Time 0% 100% 0% 100% J2 J2 J1 J1 Map CPU utilization Shuffle I/O utilization 2 3 1 0 (b) Schedule two. Fig. 1. An example for the joint scheduling of overlapping phases. Our key observation is that the map and shufﬂe phases have different resource demands. Since the map phase is CPU-intensive and the shufﬂe phase is I/O-intensive, they can potentially be conducted in parallel to minimize the average job makespan. The key challenge comes from the fact that the map and shufﬂe phases cannot be fully parallelized due to their dependency relationship. The shufﬂe phase of a job must start later than its map phase, and cannot ﬁnish earlier than its map phase. This is because the shufﬂe phase may wait to transfer the data emitted by the map phase. An example includes the classic application of the WordCount [4], in which the map workers emit key-value pairs at a certain rate to be shufﬂed to the reduce workers. If the map workload of a job is larger than its shufﬂe workload, then the I/O resource may be underutilized, leading to a non-optimal job schedule. To illustrate the above motivation more clearly, an example is shown in Fig. 1, which involves two jobs of  1 and  2 .  1 is shufﬂe-heavy and  2 is map-heavy. Assuming that the resources are fully utilized, the map and shufﬂe phases of  1 take 1 and 2 time slots, respectively. The resource demand of  2 is the opposite of that of  1 (1 time slot for the shufﬂe phase and 2 time slots for the map phase). As shown in Fig. 1(a), schedule one executes  2 ﬁrst, leading to an underutilization of the I/O resource. This is because  2 ’s shufﬂe phase needs to wait to transfer the data emitted by its map phase (suppose a constant data emission rate). Consequently, schedule one takes 4 time slots to ﬁnish all the jobs. As shown in Fig. 1(b), schedule two is a better scheme. It executes  1 ﬁrst, and only takes 3 time slots to ﬁnish all the jobs. It can be seen that, in order to maximally utilize the I/O resource, the shufﬂe-heavy job should be executed earlier than the map-heavy job.