International Journal of Computer Applications (0975 8887) Volume 83 No3, December 2013 7 Experimental Evaluation of the Performance of Processing Stealing Technique: A Scalable Load Balancing Technique for a Dynamic Multiprocessor System O. O Olakanmi Electrical and Electronic Engineering University of Ibadan, Nigeria O.A Fakolujo (Ph.D) Electrical and Electronic Engineering University of Ibadan, Nigeria ABSTRACT This paper reports preliminary experimental evaluation of a Processing Elements Stealing (PE-S) technique which was targeted as efficient and scalable load balancing technique for dynamically structured multiprocessor systems. The multiprocessor system is imagined as a dynamic cluster based multiprocessor. Each cluster of the multiprocessor system is a node in symmetric multiprocessor architecture and the number of Processing Element (PE) in each cluster is dynamically determined at runtime. The PE-S technique dynamically computes the configuration ratio using the number of threads in the dynamically assigned tasks to generate the new number of PE for each cluster. This new configuration ratio is thereafter used to balance the additional computational work generated by runtime instantiation of current workloads for each cluster. In this work, the efficiency of the PE-S was evaluated using memory traces of some tightly parallel applications where the amount of parallelism is parameterized. These traces were used as workloads on two different simulation setups; the first is a dynamic multiprocessor with PE-S while the other was also a dynamic multiprocessor but without PE-S. This is to evaluate the performance of the PE-S load balancing technique on the targeted multiprocessor. Also the efficiency of PE-S reconfigurations was compared with other possible reconfiguration ratios. The experimental results showed that the load balancing algorithm is efficient and scalable for balancing at least 100,000 instructions tasks and PE-S generated ratios are averagely better than any other reconfiguration ratios. General Terms Parallel computing, Load balancing, multiprocessor Keywords Load balancing, multiprocessor, parallel application, work stealing and sharing, processing element stealing 1. INTRODUCTION The rapid developmental trends in hardware and software technologies have led to increased interest in the use of multiprocessor systems for online database, real-time, defence strategy systems, and power intensive commercial applications. One of the major problems of multiprocessor systems is how to evenly distribute (or schedule) the processes among processing elements to achieve some performance goal(s), such as minimizing execution time, minimizing communication delays, and/or maximizing resource utilization. Therefore, load balance has become integral factor in maximising the speed up of parallel and distributed environments. In recent time, multiprocessor systems have been a subject of interest. Present researches had shown that uniprocessor technology can hardly be subjected to reasonable improvement thereby could no longer meet up with processing power requirement of the current applications. This is due to insatiable demand for computing power by users which is generated from development of powerful applications in order to meet up with the users’ demand. Parallel and Distributed processing has proffered solution to this by combining many processing elements together to behave as a single processor. Research works are still on-going on how to perfect some of the performance bottlenecks in multiprocessor systems through the adoption of some of the computer network speedup metrics to multiprocessor architecture. For example, different network topologies had been modelled, evaluated and implemented in multiprocessor systems which have brought variant multiprocessor architectures. Apart from this, concept of memory hierarchy and optimum scheduling techniques had been introduced just to achieve efficient multiprocessor systems. In spite of all these improvement metrics, multiprocessor systems performance is still marred with inefficiency in job distribution during execution which affects overall throughput of the systems. A few researches had been done, and many are still on- going on how to get a perfect load balancing technique; however, a perfect technique has become elusive. One of the biggest performance issues in the current load balance techniques is that they are system specific and some of the loads have more affinity for certain processing elements than the others. This mars the performance gain of most of the available load balance techniques. Many techniques for load balancing in multiprocessor systems had been proposed. The prominent among them are work stealing and work sharing. Recently, another technique was proposed in [15], called Processing Elements or worker stealing technique. This paper performed experimental evaluation of this technique and evaluates its performance in terms of its performance influences on the speed-up, when the technique is implemented on a dynamic multiprocessor. The rest of the paper is organized as follows; Section 2 presents the reviews on the related research works on load balancing in multiprocessor architecture. In Section 3, background work on PE-S technique is described. The experimental evaluation of PE- S technique is described in Section 4, and the discussion as related to the obtained simulation results is presented in Section 5. 2. RELATED WORKS Many load balancing algorithms have been proposed for parallel and cloud computing to prevent load imbalance [3][5][8][12][15][18]. Each of these algorithms uses different techniques to achieve load balancing among the processing elements. Work stealing and work sharing technique gained tremendous popularity due not only to their provable efficiency