Restricted duplication based MILP formulation for scheduling task graphs on unrelated parallel machines Jagpreet Singh, Bhargav Mangipudi, Sandeep Betha and Nitin Auluck Department of Computer Science & Engineering Indian Institute of Technology Ropar Rupnagar, India, 140001 Email: {jagpreets, bhargavm, sandeepb, nitin}@iitrpr.ac.in Abstract—Duplication has proved to be a vital technique for scheduling task graphs on a network of unrelated parallel machines. Few attempts have been made to model duplication in a Mixed Integer Linear Program (MILP) to reduce schedule length. Other known optimal MILPs duplicate a job on all the available processing elements and this increases their complexities. This paper proposes a new REStricted Duplication (RESDMILP) approach to model duplication in a MILP. The complexity of this model increases with the increase in the amount of duplication. Experiments conducted have revealed that RESDMILP achieves better runtimes when the problem instance is solved optimally and provides better lower bound and percentage gap if it is run for a ﬁxed amount of time. The percentage gap is deﬁned as, (UB - LB)/UB where UB and LB are the upper and lower bounds achieved by the MILPs respectively. Keywords: MILP, Scheduling, Heterogeneous, Duplication, Multiprocessors I. I NTRODUCTION The problem of scheduling tasks on multiprocessors has received considerable attention from researchers. The primary reason for this is the continuous increase in the complexity of the problem with the introduction of new computing and network architectures. Task scheduling on multiprocessors has been proven to be an NP-HARD problem in general [1], as well as in a number of restricted cases [2]. The multiprocessors in scheduling theory have been broadly classiﬁed into the following three categories: • Identical: The computational capability of all the mul- tiprocessors is exactly the same. The input in this case is described as a job-time pair (J i ,t i ) i.e. the i th job will take t i time units to execute on any processor. The multi-core architectures are classiﬁed into this category. • Uniform: In this architecture, each multiprocessor has it’s own speed denoted as s i . If a job has been assigned for t time units on the i th multiprocessor, then it completes s i * t units of execution. • Unrelated or Heterogeneous: This is the most general architecture in which different capability multiprocessors work in parallel. A job j i has a different rate of execution r i,j on a different processor p j . Hence, assignment of t time units to processor p j completes (r i,j · t) units of execution. The architectures consisting of CPU, GPU, GPGPU and other embedded processors are characterized as unrelated. In this paper, we consider the scheduling of task graphs on a network of unrelated parallel machines and the goal is to minimize the maximum schedule length, the makespan. Graham et al. [3] presented a 3-tuple notation to represent classes of scheduling problems. This problem is represented as R|prec, comm|C max where R represents the unrelated parallel processing elements (PEs). The prec and comm indicate the precedence and communication costs if jobs are scheduled on different machines, and ﬁnally C max is an objective function to minimize the makespan. The NP-Hard nature of the scheduling problem has led researchers to look for efﬁcient approximation algorithms and heuristics. A 2-factor approximation algorithm has been proposed in [4] for a case with no precedence constraints i.e. R||C max . Lenstra et. al. [5] demonstrated that unless P=NP, there is no known poly-time approximation algorithm for R||C max with a factor within 1.5 of the optimal solution. A survey on the work carried out for R|prec|C max [6] showed that there does not exist any non-trivial approximation factor for this problem considering the generalized precedence constraints. DAG scheduling heuristics on multiprocessors (homoge- neous & heterogeneous) can be broadly classiﬁed into: list- based and clustering based with or without duplication. The concept of duplication [7] has received considerable interest among others. By duplicating the heavily communicating jobs on the same processor, the interprocessor communication costs can be minimized, which can reduce the makespan. In most cases, the predecessors of a job are duplicated in available schedule holes that develop because of precedence constraints. Figure 1 demonstrates the signiﬁcance of duplication where job 1 is duplicated on processor P 2 and P 3 to avoid the communication costs to its children 3 and 4. This reduces the makespan to 5. It has been observed that duplication plays a signiﬁcant role to get the optimal solution to this scheduling problem [8].