Towards Service Level Agreement Based Scheduling on the Grid * Jon MacLaren, Rizos Sakellariou, Krish T. Krishnakumar University of Manchester Oxford Road Manchester M13 9PL United Kingdom Jon Garibaldi and Djamila Ouelhadj University of Nottingham Jubilee Campus Nottingham NG8 1BB United Kingdom Abstract The orchestration of complex workflows on the Grid is emerging as a key goal for the Grid community. It is nec- essary for these workflows to be executed reliably, respecting any dependences; it is also desirable for the user to know (al- beit approximately) when the workflow will complete. The authors argue that meeting these goals will necessitate a ma- jor shift in the underlying scheduling technology, which is ultimately used to execute any computational tasks contained in these workflows. This position paper describes a recently funded project that aims to establish a fundamental new infrastructure for effi- cient job scheduling on the Grid, based on a notion of Service Level Agreements. These are negotiated between the client (user, superscheduler, or broker) and the scheduler, contain information on acceptable job start and end times, and may be re-negotiated during runtime. Introduction Within the Grid community at present, there is a keen focus on the management and scheduling of workflows, i.e. complex jobs, consisting multiple computational tasks, connected either in a Directed Acyclic Graph (DAG), or in a more general graph, incorporating conditionals and branches. In order for a workflow enactment engine, such as the GriPhyN Project’s Pegasus (PEGASUS 2004), or the UNICORE Grid middleware (UNICORE 2004), to success- fully orchestrate these workflows, it must be possible to schedule multiple computational tasks onto (possibly) dis- tributed resources, while still respecting any dependences in the workflow. Current methods employed to schedule work on compute resources within the Grid are unsuitable for this purpose. Although there has been a considerable amount of work related to scheduling DAGs onto heterogeneous sys- tems (Sakellariou & Zhao 2004; Topcuoglu, Hariri, & Wu 2002) this is not directly applicable in the Grid context since it assumes a static environment where execution costs are well-known in advance; only limited attempts have * This work is funded by the EPSRC Fundamental Computer Science for e-Science initiative (Grant GR/S67654/01), whose sup- port we are pleased to acknowledge. Copyright c 2004, American Association for Artificial Intelli- gence (www.aaai.org). All rights reserved. been made to consider situations where run-time changes in the estimated costs may require some reconsideration of scheduling decisions (Zhao & Sakellariou 2004). As a re- sult, at the moment, the enactment of a workflow has to rely on traditional, queue-based scheduling systems. Such scheduling systems (often called “batch” schedulers) are queue-based, and provide only one level of service, namely ‘run this when it gets to the head of the queue’, which ap- proximates to ‘whenever’. This uncertainty means that any workflow enactment engine must wait for components of the workflow to complete before beginning to schedule depen- dent components. This approach fails to hide the latencies resulting from the length of the job queues, which then con- trol and determine the execution time of the workflow. New patterns of usage arising from Grid computing (and other areas) have resulted in the introduction of advance reservation to these schedulers, where jobs can be made to run at a precise time. However, this is also an extreme level of service, and is excessive for many workflows, where of- ten it would be sufficient to know the latest finish time, or perhaps the soonest start time and latest end time. Advance reservation (in its current form) is also unsuitable for any scenario involving the simultaneous scheduling of multiple computation tasks, either as a sequence, or tasks that must be co-scheduled. In such cases, a client (superscheduler or bro- ker) must be able to simultaneously negotiate with a num- ber of resources, only committing to suitable arrangements. Current advance reservation APIs only allow the request of a reservation, with a yes/no answer, where ‘yes’ denotes a booking. This is inadequate, as good guessing is required to book all the resources for suitable times, the difficulty in- creasing exponentially with the number of resources. Also, in its current form, advance reservation has several disadvantages for the resource owner. When an advance reservation is made, the scheduler must place jobs around this fixed job. Typically, this is done using backfilling (Lifka 1995), which increases utilisation by searching the work queues for small jobs, which can plug the gaps. In prac- tice, this rarely works perfectly, and so the scheduler must either leave the reserved processing elements empty for a time, or suspend or checkpoint active jobs near to the time of the reservation. These processes are not instantaneous; e.g. checkpointing a 64 processor Unified Weather Model job on an O3800 takes about 12 minutes, despite a small