Development of a method of optimising data distribution on a loosely coupled multiprocessor system zyxwv A.Symons V.Lakshmi Narasimhan zyxwvutsrqp Indexing ternqs' Microprocessor systems, Data distribution, Load-balancing algorithms ~ ~~~ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Abstract: zyxwvutsrqpo The maximum speedup of a multiprocessor system IS limited by the sequential part of an algorithm, and in loosely coupled processor systems a large part of this sequentiality is caused by the communication between processors. As this communication is dependent on the distribution of data the data distribution must be optimised in order to achieve the maximum speedup. In the paper the authors present a new method of determining the distribution for loosely coupled multiprocessors using a branch and bound technique based on the Moore-Skelboe interval arithmetic algorithm. The key issue of this load-balancing algorithm hiis been addressed, namely the branch selection criterion. When this method is applied to a matrix multiplication algorithm running on a cluster of workstations, the optimal data distribution provides a significant performance increase of 44% over the equal distribution, which does not take into account communication overheads. Further, it is shown that, for a workstation cluster with random variations iin their processing speeds, the execution time ratio of the equal and optimal distributions remains relatively unchanged. Thus the execution time of the optimal data distribution is no more sensitive to processor speed variation than the execution time of the equal distribution. 1 Introduction As processors approach the physical limits on clock speed, more and more emphasis is being placed on effi- cient parallel systems. This leads to a great reliance on process scheduling and load balancing. Indeed, dynamic and static load balancing of ray tracing in computer graphics have been previously considered by Green [7] and Badouel [2], where it has also been noted that parallel computers can offer high performance to zyxwvu 0 IEE, 1996 IEE PioceedingA online no 19960467 Paper first received 18th November 1994 and in revised form 5th February 1996 A Symons is with CITR, Queensland, Australia 4072 V Lakshmi Narasimhan 19 with the Information Technology Dimion, DSTO, SA, Australia 5108 IEE Proc.-Comput. Digif. Tech.. Vol. 143, No. 4, July 1996 computer griiphics. Often on(: can develop a mathemat- ical model of the algorithm running on a particular sys- tem for the purpose of load balancing. This model will include terms for synchronisation, communication and processing times, each of these terms being interde- pendent. Using this mathematical model of the system, it is desired to determine the optimal static scheduling which will result in minimum execution time. This is a form of the general assignment problem (GAP), i.e. assign M tasks to N agents so that minimal execution time is obta,ned. This problem has been shown to be NP-hard [3]. The mathematical model will often include nonlinear terms (e.g. for the matrix multiplication, the execution time is O(N')) and the final data distribution must be an integer solution since a processor cannot do half a job. Therefore, this optimisation problem is a nonlinear integer programming problem, and to solve this, a method is required to operate on the mathematical sys- tem model to generate the static schedule of tasks. Pre- vious solutions to the integer programming problem require constraints, which need to be tailored for the particular problem, to be added to the model ensuring integer resulls [3], taking derivatives which may not be possible if the objective function is not smooth, or not guaranteeing optimality [4]. In this paper, a new method based on interval arith- metic is proposed and implemented using the C++ pro- gramming language. The method, called the Moore- Skelboe technique [5-71, has been proved by Rokne [8] to yield the guaranteed global minimum. Our imple- mentation allows the model to be represented simply in terms of mathematical functions which need not be smooth. In addition, our algorithm can be used with- out the need to tailor the optimisation process to par- ticular models as the Moore-Skelboe algorithm is robust and acts directly om the objective function. Robustness is necessary because previously it has been noted [3] that without robust code, most global optimi- sation problems cannot achieve good results. The motivation for optimising the execution times from the system model is due to the two assertions we postulate based on the following assumptions: 1. The model for the system is accurate enough to pre- dict the individual task's execution time. 2. Communication times are dependent on the topol- ogy of the interconnection network and the location of the parlkular processor. 239