In Proceedings of 18 th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, USA, April 2004, 8 pages. Queue Scheduling and Advance Reservations with COSY Junwei Cao and Falk Zimmermann C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany {cao, falk}@ccrl-nece.de Abstract Most of current job scheduling systems for supercomputers and clusters provide batch queuing support. With the development of metacomputing and grid computing, users require resources managed by multiple local job schedulers. Advance reservations are becoming essential for job scheduling systems to be utilized within a large-scale computing environment with geographically distributed resources. COSY is a lightweight implementation of such a local job scheduler with support for both queue scheduling and advance reservations. COSY queue scheduling utilizes the FCFS algorithm with backfilling mechanisms and priority management. Advance reservations with COSY can provide effective QoS support for exact start time and latest completion time. Scheduling polices are defined to reject reservations with too short notice time so that there is no start time advantage to making a reservation over submitting to a queue. Further experimental results show that as a larger percentage of reservation requests are involved, a longer mandatory shortest notice time for advance reservations must be applied in order not to sacrifice queue scheduling efficiency. 1. Introduction Scheduling of parallel jobs on supercomputers and clusters has been an active research topic in the high performance computing community for over ten years [5]. Most current scheduling systems provide batch queuing support. One of the basic queue orders is first-come-first- served (FCFS), in which jobs are ordered by the arrival time. The backfilling technique is proven to be effective to improve scheduling performance with minor drawbacks and thus widely used. With the development of metacomputing [11] and grid computing [7], users require access to multiple resources that may be distributed geographically. In general, a user would like to reserve all of the resources in advance for a distributed application so that corresponding quality of service (QoS) requirements can be guaranteed, e.g. the execution time. COSY is a lightweight implementation of such a local job scheduler with plugs into SCore [19] and NEC MPI [12] environments. The current COSY implementation supports both queue scheduling and advance reservations. The COSY queue scheduling is accompanied by an aggressive backfilling mechanism that attempts to allocate currently unutilized nodes to jobs behind in the priority queue of waiting jobs without possible delaying the head of waiting jobs. Advance reservations with COSY can be used to guarantee an exact job start time or latest completion time. Users can query about a reservation and confirm it later within a period of time. The two phase commitment is designed for COSY to be utilized in a metacomputing or grid computing environment with QoS negotiation requirements. When queue scheduling is combined with advance reservations, users especially with lower priorities may utilize advance reservations to gain start time advantages. If a user finds that an advance reservation can lead to an earlier job start time than submitting to the queue, he may choose to make reservations even though there is no explicit QoS requirements associated to the job. In a commercial environment, this can be prevented by charging more to advance reservations. In an academic / research environment, this has to be solved by applying proper scheduling policies. In the COSY scheduler described in this work, the issue is addressed by applying a shortest notice time for each reservation. Only if the notice time is longer than the threshold can a reservation be accepted. The shortest notice time for a reservation is defined using the predictive wait time as if the reservation were submitted as a queue job. The prediction is based on historical information and defined using the mean wait time of queued jobs. Experiments are designed in this work using a representative workload, which is generated using the Cirne and Berman archive [3] of parallel workload models included in [4]. Experimental results show that the existence of advance reservations still prolongs the queue wait time even though no start time advantages are taken. A longer shortest notice time must be applied for advance reservations in order not to sacrifice the queue