Task Migration for Dynamic Power and Performance Characteristics on Many-Core Distributed Operating Systems Simon Holmbacka, Wictor Lund, S´ ebastien Lafond, Johan Lilius Department of Information Technologies, ˚ Abo Akademi University Joukahaisenkatu 3-5 FIN-20520 Turku Email: firstname.lastname@abo.fi Abstract—Spatial locality of task execution will become more important on future hardware platforms since the number of cores are steadily increasing. The large amount of cores requires more intelligent power management due to the notion of spatial locality, and the high chip density requires an increased thermal awareness in order to avoid thermal hotspots on the chip. At the same time, high performance of the CPU is only achieved by parallelizing tasks over the chip in order to fully utilize the hardware. This paper presents a task migration mechanism for distributed operating systems running on many- core platforms. In this work, we evaluate the performance and energy efficiency of an implemented task migration mechanism. This is shown by parallelizing tasks as the performance of a single core is not sufficient, and by collecting tasks to as few cores as possible as CPU load is low. The task migration mechanism is implemented as a library for FreeRTOS using 1300 lines of code, and introduced a total task migration overhead of 100 ms on a shared memory platform. With the presented task migration mechanism, we intend to improve the dynamism of power and performance characteristics in distributed many-core operating systems. Keywords-Task Migration, Distributed Operating Systems, Many-Core Systems, ARM Cortex-A9 I. I NTRODUCTION Spatial locality of resources provides a measurement of the distance between executing tasks and their resources. The value is proportional to the communication delay introduced between the communicating tasks due to spatial separation. In a many-core Network-on-Chip (NoC) processor, this over- head is clearly noticed as the messages need to propagate along the routing network of the chip. In order to get as small as possible communication overhead when using inter-core communication, the communicating tasks should be placed as close as possible to each other. An optimal mapping of tasks can in a static system be done at compile time, but in a general purpose computer with dynamic task creation, execution times, suspension etc. the tasks should migrate on the chip during runtime to obtain the smallest communication overhead. High system performance is usually improved by mapping tasks in parallel applications on multiple cores in order to improve the hardware utilization, since multiple processing elements are then capable of executing separate parts of the application in parallel. On the other hand, performance improvements are usually achieved with the sacrifice of energy. In contrast to parallelizing tasks, collecting them to only a few cores allows for sleep state based power management to shut down idle cores and create a more energy efficient system. In both cases, tasks must be movable during runtime in order for the system to be able to optimize for energy vs. performance schemes. Another important issue caused by the locality of task execution is the thermal balance inside the chip [1], [2]. By changing the location of task execution on the chip, it is possible to avoid thermal hotspots which can gradually wear out the chip. Work has previously been done in terms of task scheduling and heat distribution on the chip. Figure 1 shows an example of how the mapping of tasks affects the thermal gradient of the CPU. The left part of the figure shows a highly parallelized mapping in which the temperature is more evenly balanced, while the right part shows a mapping which concentrates tasks to only a few CPU cores and forms a red hotspot. From the figure it is clear that task mapping Figure 1. Thermal gradients of many-core chip (Red means hot). The labels (g,m) illustrate core group g and core number m [3] on many-core systems affects the temperature and hotspots on the chip based on the spatial locations of the tasks. This effect will show even more clearly in 3D chips [4] since heat producing elements will spread out in three dimensions. Task migration is the required technique to re-map tasks on a CPU, and thus enable the aforementioned dynamism. In this paper we present the implementation of a task migration mechanism using checkpoints for homogeneous many-core systems with shared memory. We show how task migration can be used to improve performance and create a more energy efficient system. Furthermore, the overhead of 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing 1066-6192/12 $26.00 © 2012 IEEE DOI 10.1109/PDP.2013.52 310 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing 1066-6192/12 $26.00 © 2012 IEEE DOI 10.1109/PDP.2013.52 310