Task Migration for Dynamic Power and Performance Characteristics on Many-Core
Distributed Operating Systems
Simon Holmbacka, Wictor Lund, S´ ebastien Lafond, Johan Lilius
Department of Information Technologies,
˚
Abo Akademi University
Joukahaisenkatu 3-5 FIN-20520 Turku
Email: firstname.lastname@abo.fi
Abstract—Spatial locality of task execution will become more
important on future hardware platforms since the number
of cores are steadily increasing. The large amount of cores
requires more intelligent power management due to the notion
of spatial locality, and the high chip density requires an
increased thermal awareness in order to avoid thermal hotspots
on the chip. At the same time, high performance of the CPU is
only achieved by parallelizing tasks over the chip in order to
fully utilize the hardware. This paper presents a task migration
mechanism for distributed operating systems running on many-
core platforms. In this work, we evaluate the performance and
energy efficiency of an implemented task migration mechanism.
This is shown by parallelizing tasks as the performance of
a single core is not sufficient, and by collecting tasks to as
few cores as possible as CPU load is low. The task migration
mechanism is implemented as a library for FreeRTOS using
1300 lines of code, and introduced a total task migration
overhead of 100 ms on a shared memory platform. With the
presented task migration mechanism, we intend to improve
the dynamism of power and performance characteristics in
distributed many-core operating systems.
Keywords-Task Migration, Distributed Operating Systems,
Many-Core Systems, ARM Cortex-A9
I. I NTRODUCTION
Spatial locality of resources provides a measurement of
the distance between executing tasks and their resources. The
value is proportional to the communication delay introduced
between the communicating tasks due to spatial separation.
In a many-core Network-on-Chip (NoC) processor, this over-
head is clearly noticed as the messages need to propagate
along the routing network of the chip. In order to get
as small as possible communication overhead when using
inter-core communication, the communicating tasks should
be placed as close as possible to each other. An optimal
mapping of tasks can in a static system be done at compile
time, but in a general purpose computer with dynamic task
creation, execution times, suspension etc. the tasks should
migrate on the chip during runtime to obtain the smallest
communication overhead.
High system performance is usually improved by mapping
tasks in parallel applications on multiple cores in order to
improve the hardware utilization, since multiple processing
elements are then capable of executing separate parts of
the application in parallel. On the other hand, performance
improvements are usually achieved with the sacrifice of
energy. In contrast to parallelizing tasks, collecting them
to only a few cores allows for sleep state based power
management to shut down idle cores and create a more
energy efficient system. In both cases, tasks must be movable
during runtime in order for the system to be able to optimize
for energy vs. performance schemes.
Another important issue caused by the locality of task
execution is the thermal balance inside the chip [1], [2].
By changing the location of task execution on the chip, it is
possible to avoid thermal hotspots which can gradually wear
out the chip. Work has previously been done in terms of task
scheduling and heat distribution on the chip. Figure 1 shows
an example of how the mapping of tasks affects the thermal
gradient of the CPU. The left part of the figure shows a
highly parallelized mapping in which the temperature is
more evenly balanced, while the right part shows a mapping
which concentrates tasks to only a few CPU cores and forms
a red hotspot. From the figure it is clear that task mapping
Figure 1. Thermal gradients of many-core chip (Red means hot). The
labels (g,m) illustrate core group g and core number m [3]
on many-core systems affects the temperature and hotspots
on the chip based on the spatial locations of the tasks. This
effect will show even more clearly in 3D chips [4] since heat
producing elements will spread out in three dimensions.
Task migration is the required technique to re-map tasks
on a CPU, and thus enable the aforementioned dynamism.
In this paper we present the implementation of a task
migration mechanism using checkpoints for homogeneous
many-core systems with shared memory. We show how task
migration can be used to improve performance and create a
more energy efficient system. Furthermore, the overhead of
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
1066-6192/12 $26.00 © 2012 IEEE
DOI 10.1109/PDP.2013.52
310
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
1066-6192/12 $26.00 © 2012 IEEE
DOI 10.1109/PDP.2013.52
310