MPI-Based Adaptive Task Migration Support on the HS-Scale System Saint-Jean N., Benoit P., Sassatelli G., Torres L., Robert M. University of Montpellier II, LIRMM, UMR 5506 <name>@lirmm.fr Abstract Scalability of architecture, programming model and task control management will be a major challenge for future VLSI systems. In this context, homogeneous MPSOC is a seducing approach as it is intrinsically scalable. HS-Scale is a contribution in this domain and was already published in [1,2]. In this article, we present an original MPI-based adaptive task migration support for the HS-Scale system. Our previous communication API was modified in order to be MPI compliant. In order to enable task migration without any MMU, a Position Independent Code compilation technique is implemented. The self-adaptability is based on monitoring information collected at run-time by each processing element (PE). Each PE is endowed with the same decisional capability insuring the scalability of the solution. A MJPEG case study validated on a multi-FPGA prototyping platform is presented. The observation of the dynamic behavior of HS-Scale shows that the system is able to find itself a stable task placement providing the best performance in terms of processing throughput. 1. Introduction Multi-Processor System-on-Chips (MPSoCs) are becoming an increasingly popular solution that combines flexibility of software along with potentially significant speedups. These complex systems usually integrate a few mid-range microprocessors for which an application is usually statically mapped at design-time. Those applications however tend to increase in complexity and often exhibit time-changing workload which makes mapping decisions sub-optimal in a number of scenarios. Our previous work, presented in [1] and [2] aimed at exploring and defining principles which granted both hardware and software scalability, namely HS-Scale. The hardware architecture, H-Scale, is a homogeneous MP-SOC based on RISC processors, distributed memories and an asynchronous network on chip. S-Scale is a programming model handled at run-time by a compact Operating System which permits essentially to schedule tasks and to manage the memory and communications between tasks. Several experiments were conducted on HS-Scale with the implementation of several applications: FIR, DES and MJPEG. The results showed the importance of the task placement on the architecture with several task characteristics in terms of regularity and granularity. One very important result was the correlations observed between performance and task distribution allowing us to forecast some adaptive strategies to automate the task management and distribute it over the system. In this paper, our new contribution is the complete implementation of an adaptive migration support based on the Message Passing Interface (MPI) programming model. The HS-Scale system is now based on a set of adaptive principles which endow the architecture with some decisional capabilities. Based on a distributed monitoring scheme, each processing element is able to migrate automatically its tasks following a customizable policy. These mechanisms were developed in a fully decentralized fashion in order to satisfy the scalability of our solution. Also, an originality of our contribution is that the task migration is performed without any MMU (Memory Management Unit), but enabled thanks to PIC (Processor Independent Code) compilation. This paper is organized as follows. Section 2 presents the related work in the field of task migration techniques for MPSOC systems. Section 3 resumes our previous work on the hardware architecture H-Scale. In section 4, the operating system and the programming model based on MPI are presented. . Our migration support is then exposed in section 5; the dynamic task loading based on PIC compilation is especially detailed and run-time adaptive mechanisms are then described. In section 6, we present the basis of our experiments, a multi-board prototyping platform. We present and discuss results obtained on a case study, MJPEG. 2. Related works Our new contribution is the implementation of a full adaptive task migration support based on the MPI programming model. Task migration has been studied in the literature in both shared and distributed memory systems over the past as it is shown in the following paragraphs. For shared memory systems such as today regular multi- core PCs, the process is facilitated by the fact that no data has to be moved across several physical memories; there exist several efficient implementations on general purposes OS such as Windows or Linux [3]. Task migration has also been explored for MPSoCs, notably based on locality considerations [4] for decreasing communication overhead or power consumption [5]. In [6], authors present a migration case study for MPSoCs that relies on the µClinux operating system and a checkpointing mechanism. The system uses the MPARM framework [7], and although several memories are used the whole system supports data coherency through a shared memory view of the system. Migration on message-passing systems is generally a more difficult problem since both process code and state has to be moved from a processor to another, and synchronizations must IEEE Computer Society Annual Symposium on VLSI 978-0-7695-3170-0/08 $25.00 © 2008 IEEE DOI 10.1109/ISVLSI.2008.87 105