Static Workload Balance Scheduling; Continuous Case Sabin Tabirca, T. Tabirca, Len Freeman, Laurence Tianruo Yang§ Department of Computer Science, University of Manchester Oxford Road, Manchester, M13 9NG, UK University College Cork, Department of Computer Science College Road, Cork, Ireland §Department of Computer Science, St. Francis Xavier University P.O.Box 5000, Antigonish, NS, B2G 2W5, Canada Abstract This article studies a static scheduling method based on workload balancing in the continuous case. An equation is presented for the case when the workload, as continu- ous function is equally distributed onto processors based on integrals. A sufficient condition is also established for the fully covering property. Finally, some computational results are given to prove that the continuous case is better that the discrete case. 1 Introduction Parallel programming has been used intensely in order to solve problems with a large number of computations or large volumes of data. These problems naturally arise in real word applications (e.g. Weather Prediction) or theoret- ical applications (e.g. Differential Equations). Loops repre- sent an important source of parallelism and occur in at most all the scientific applications. Many algorithms dealing with loop scheduling have been proposed so far. Every loop scheduling method provides a mapping of the loop iterations to a number of processors. There are two important classes of loop scheduling. The first class con- tains static scheduling methods that finds this mapping at compile-time. These can also be grouped in block methods where successive iterations are mapped to a processor and cyclic methods where a processor receive loop iterations in a cyclic manner. A well-known static block scheduling is the uniform scheduling for which all the chunks have almost the same dimension. The second class contains dynamic scheduling methods which decide the mapping of the loop iterations at run-time. * The corresponding email addresses are s.tabirca@cs.ucc.ie, {abircat, freeman}@cs.man.ac.uk,and lyang@stfx.ca These methods use a set of queue structures and a strat- egy for taking loop iterations from them. The guided self- scheduling algorithms (Polychronopoulos et al. [7]) and some of theirs variants (Eager and Zahojan [3], Hummel et al. [4], Lucco [6]) have one single queue. The number of chunks is considerably greater than the number of proces- sors and decreases dynamically while the chunks are taken from the queue and assigned to processors. Potential loss of performance may be caused by overheads such as loss of data locality, inefficient unrolling and pipelining etc. The affinity self-scheduling algorithms (Yang et al. [9]) avoid a part of these overheads by considering one single queue per processor. A recent scheduling method named Feedback Guided Dynamic Loop Scheduling was proposed by Bull [1] to solve the scheduling of a sequence of similar parallel loops. The method divides the loop iterations into p chunks during the run-time so that each processor receives similar work- load. Based on this approach Tabirca et al. [8] proposed a static block scheduling method where the workloads are ap- proximately equally distributed onto processors. A major inconvenience of this method is that the upper bounds of the scheduling partition do not have simple equations. This article addresses this issue and shows how to obtain simpler bounds if the continuous case is used. 2 Static Loop Scheduling Based on Workload Balance: Discrete Case Tabirca et al. [8] proposed a static scheduling based on a balanced distribution of the workloads onto processors. The main results concerning this method are outlined in the following. Consider that there are p processors denoted in the following by P 1 ,P 2 , ..., P p and the single parallel loop. We also assume that the workload of the routine 0-7695-1926-1/03/$17.00 (C) 2003 IEEE