Samarth Kaushik, Amit Kumar Singh, Thambipillai Srikanthan
Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore
{samarth2, amit0011, astsrikan}@ ntu.edu.sg
Abstract— Design-time strategies are suited only for
mapping predefined set of applications and thus cannot
predict dynamic behavior. This dynamism demands
run-time mapping of application tasks to maintain a
critical balance between performance and resource
optimization. This paper proposes a run-time heuristic
that intelligently distributes the application tasks among
multiple processors taking communication overhead,
computation load and resource utilization in
consideration.
Keywords: Multiprocessor System-on-Chip (MPSoC),
Network-on-Chip (NoC), Mapping Algorithms.
I. INTRODUCTION
System-on-Chip (SoC) design is experiencing a radical
shift from uni-processor architecture to multi-processor
architecture in order to adjust with the ever increasing
demand for high performance. The rising complexity of
real-life applications cannot be addressed by simply trying
to make single-core processors run faster, instead it requires
multiple processors, connected with a Network-on-Chip
(NoC), which can cohesively communicate and provide
increased concurrency [1].
The challenge is to map parallelized tasks of an
application onto MPSoC platform, which entails a judicious
mechanism of mapping these tasks on various processing
elements (PEs), either at design-time or at run-time.
Numerous design-time mapping techniques have been
developed but they are limited to predefined set of
applications and are unaware of run-time resource
management [2], whereas run-time mapping techniques can
be employed to large number of applications and
incorporate run-time resource management. In [3],
Holzenspies et al. propose a run-time strategy for mapping
inherently parallel streaming applications on MPSoC. Singh
et al. [4] describe a communication aware run-time mapping
heuristic for MPSoC platforms accommodating multiple
tasks on a single PE. The heuristic tries to minimize the
communication overhead between two highly
communicating tasks by mapping them on the same PE.
However, existing heuristics does not attempt to balance the
computation load on each PE utilized for mapping and also
involves a restricted approach for minimizing
communication overhead.
We present a run-time task mapping technique that
reduces computation load variance and delineates
substantial performance improvements along with efficient
resource utilization.
II. PROPOSED ALGORITHM
Our technique performs pre-processing of the application
graph before actual mapping is done in order to reduce the
communication overhead and improve the load balancing on
various platform PEs, taking available memory on PEs into
consideration.
Application Model. An application is modeled as a set of
communicating parallel processes represented as a task
graph. The task graph is denoted as a directed graph ATG =
(T, E), where T is a set of application tasks and E is the set
of all edges in the application, connecting the tasks and
representing their communication as shown in Figure 1. A
task t
i
∈ T is represented as (t
id
, t
comp
), where t
id
is the task
identifier and t
comp
is the task computation load in cycles. An
edge e
i
∈ E connecting the two tasks contains
zcommunication information (t
comm
) between the tasks. t
comm
represents the number of cycles taken for transferring a
single token when full channel bandwidth is available.
Platform Model. The MPSoC architecture is a graph AG
= (P, C), where P is the set of PEs identified by its identifier
p
id
and C represents the on chip communication channels for
interconnecting the PEs. The PEs are connected in 4×4
mesh topology by a NoC. Among the available PEs, one is
used as Manager Processor that is responsible for managing
task operations and resources usage, including run-time
management of task loads.
Mapping. Task mapping is represented by function mpg:
t
i
∈ T → p
i
∈ P, which maps each task of the application on
the platform PEs.
A. Pre-Processing
The technique tries to minimize communication latencies
among various tasks of the application while simultaneously
trying to balance the processing load on various PEs. The
scheme starts by targeting the communication intensive
edges in the application and attempts to merge these highly
communicating tasks on the same PE. The merging
operation takes place only if memory constraint of the
involved PE is satisfied, i.e., the PE must have sufficient
memory to accommodate both the tasks and shared memory
for their local communication. The shared memory is
required by communication data on the edge of the
connecting tasks. The proposed strategy forms a global
approach as complete application graph is seen in entirety
for removing communication bottlenecks, in contrast to
mapping technique in [4] where merging of communicating
tasks takes place during execution. The main purpose of
Preprocessing-based Run-time Mapping of Applications on NoC-based MPSoCs
2011 IEEE Computer Society Annual Symposium on VLSI
978-0-7695-4447-2/11 $26.00 © 2011 IEEE
DOI 10.1109/ISVLSI.2011.43
335
2011 IEEE Computer Society Annual Symposium on VLSI
978-0-7695-4447-2/11 $26.00 © 2011 IEEE
DOI 10.1109/ISVLSI.2011.43
337