Provably Efficient Adaptive Scheduling For Parallel Jobs Yuxiong HE 1 , Wen Jing HSU 1 , Charles E. LEISERSON 2 1 Nanyang Technological University 2 Massachusetts Institute of Technology Abstract— Scheduling competing jobs on multiprocessors has always been an important issue for parallel and dis- tributed systems. The challenge is to ensure global, system- wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications. This paper presents two novel adaptive scheduling algo- rithms – GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job’s parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guar- antees. We also believe that our new approach of resource request-allotment protocol deserves further exploration. Specifically, both GRAD and WRAD are O(1)- competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non- batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times. Index Terms— Adaptive scheduling, Competitive analysis, Data-parallel computing, Greedy scheduling, Instantaneous parallelism, Job scheduling, Makespan, Mean response time, Multiprocessing, Multiprogramming, Parallelism feedback, Parallel computation, Processor allocation, Span, Thread scheduling, Two-level scheduling, Space sharing, Trim analy- sis, Work, Work-stealing. I. Introduction Parallel computers are expensive resource that often must be shared among a large community of users. One major issue of parallel job scheduling is how to efficiently share resources of multiprocessors among a number of compet- ing jobs, while ensuring each job a required quality of ser- vices (see e.g. [6], [7], [9], [11], [14], [16]–[19], [24], [27], [29], [31]–[35], [37], [43], [44]). Efficiency and fairness are two important design goals, where the efficiency is often quantified in terms of makespan and mean response This research was supported in part by the Singapore-MIT Alliance, and NSF Grants ACI-0324974 and CNS-0305606. time. This paper summaries several scheduling algorithms we developed. For scheduling of individual jobs, our algorithms ensure short completion time and small waste, for scheduling of job sets, they offer provable efficiency in terms of the makespan and mean response time by allotting each job a fair share of processor resources. Moreover, our algorithms are non-clairvoyant [9], [14], [16], [24], i.e. they assume nothing about the release time, the execution time, and the parallelism profile of jobs. Parallel job scheduling can be implemented using a two-level framework [19]: a kernel-level job scheduler which allots processors to jobs, and a user-level thread scheduler which maps the threads of a given job to the allotted processors. The job schedulers may implement either space-sharing, where jobs occupy disjoint processor resources, or time-sharing, where different jobs may share the same processor resources at different times. Moreover, both the thread scheduler and the job scheduler may be either adaptive, allowing the number of processors allotted to a job to vary while the job is running, or nonadaptive (called “static” in [12]), where a job runs on a fixed number of processors over its lifetime. Our schedulers apply two-level structure in the context of adaptive scheduling. With adaptive scheduling [4] (called “dynamic” scheduling in [19], [30], [32], [46], [47]), the job scheduler can change the number of processors allotted to a job while the job executes. Thus, new jobs can enter the system, because the job scheduler can simply recruit processors from the already executing jobs and allot them to the new jobs. Without a suitable feedback mechanism, however, both adaptive and nonadaptive schedulers may waste processor cycles, because a job with low parallelism may be allotted more processors than it can productively use. If individual jobs provide proper parallelism feedback to the job scheduler, waste can be reduced. Therefore, at regular intervals (called quanta), a thread scheduler estimates the desire and provides it to the job scheduler; the job scheduler allots the processors to the jobs based on the request. This feedback mechanism is called request- allotment protocol. Since the future parallelism of jobs is generally unknown, the challenge here is to develop a request-allotment protocol, which gives an effective way to estimate desire and allocate processors. Various researchers [13], [14], [22], [32] have used the notion of instantaneous parallelism — the number of processors the job can effectively use at the current