384 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009
Special Section Briefs
A Case Study for NoC-Based Homogeneous
MPSoC Architectures
Sergio V. Tota, Mario R. Casu, Massimo Ruo Roch,
Luca Macchiarulo, and Maurizio Zamboni
Abstract—The many-core design paradigm requires flexible and
modular hardware and software components to provide the required
scalability to next-generation on-chip multiprocessor architectures. A
multidisciplinary approach is necessary to consider all the interactions
between the different components of the design. In this paper, a com-
plete design methodology that tackles at once the aspects of system level
modeling, hardware architecture, and programming model has been suc-
cessfully used for the implementation of a multiprocessor network-on-chip
(NoC)-based system, the NoCRay graphic accelerator. The design, based
on 16 processors, after prototyping with field-programmable gate array
(FPGA), has been laid out in 90-nm technology. Post-layout results show
very low power, area, as well as 500 MHz of clock frequency. Results show
that an array of small and simple processors outperform a single high-end
general purpose processor.
Index Terms—Multiprocessor systems-on-chip (MP-SoC), network-on-
chip (NoC).
I. INTRODUCTION
T
HE unrelented trend toward higher computation performance
had led so far to an increase of the complexity and the number
of the functional units of single monolithic microprocessors. Recently,
this trend has started to slowdown even if the number of transistors is
expected to continue to double every three years [1]. Power-thermal
issues as well as design complexity have begun to limit the perfor-
mance growth-rate compared with the increasing number of transis-
tors available in a single die [2]. One way to cope with this produc-
tivity gap is the “tile-design” concept which underlies a simple yet ef-
fective paradigm: parallelization through replication of many identical
blocks placed each in a tile of a regular array fabric. Instead of fo-
cusing on improving the complexity of a single block, the solution aims
at delivering performance through several replicas of the same basic
blocks. This approach has the major positive consequence of making
systems design a matter of instantiation capability instead of archi-
tecture complexity, an objective which has to be pursued through in-
novative scalable hardware/software solutions. The resulting architec-
ture can be certainly seen as an on-chip multiprocessor system. There-
fore, we will refer to such system as a “homogeneous” multiprocessor
systems-on-chip (MP-SoC), although the recent literature seems to re-
serve the MP-SoC acronym to the case of “heterogeneous” processors.
MP-SoC design is a multidisciplinary research activity that encom-
passes on-chip communication infrastructures, microprocessor archi-
tectures, programming models, codesign/cosimulation flows and flex-
ible methodologies for system level modeling and exploration.
Manuscript received December 10, 2007; revised April 04, 2008. First pub-
lished February 06, 2009; current version published February 19, 2009.
S. V. Tota, M. R. Casu, M. R. Roch, and M. Zamboni are with the Dipar-
timento di Elettronica, Politecnico di Torino, I-10129 Torino, Italy (e-mail:
sergio.tota@polito.it; mario.casu@polito.it; massimo.ruoroch@polito.it; mau-
rizio.zamboni@polito.it).
L. Macchiarulo is with the Department of Electrical Engineering, University
of Hawaii, Honolulu, HI 96822 USA (e-mail: lucam@hawaii.edu).
Digital Object Identifier 10.1109/TVLSI.2008.2011239
Network-on-chip (NoC) is seen as the interconnection methodology
for such systems [3]. The motivations for this choice are the better
scalability of performance with the increasing number of processing
elements (PEs) compared to standard on-chip busses, the high regu-
larity which improves layout and particularly the allocation of wiring
resources and the layered design approach which enables tackling the
SoC complexity.
The computation is usually performed by a microprocessor. Even if
it is always possible to implement hard-wired PEs, a programmable ar-
chitecture gives the required flexibility to adapt and reuse the system
in different scenarios. The possibility of reusing the same chipset in
different devices is the only solution to face the growing costs of R&D
as well as of mask-sets [4]. The performance of latest application-spe-
cific integrated processors (ASIPs) together with the high availability of
transistors is making the design of custom hard-wired logic always less
convenient (time-to-market, respin risks). Furthermore, current ASIPs
offer a high level of configurability. It is now possible to optimize the
code execution adding hardware support for frequent/computation-in-
tensive operations.
Mastering the complexity of an MP-SoC requires new approaches
to replace the standard design flow. Nowadays, the register transfer
level (RTL)-to-netlist methodology has reached its maximum of ef-
ficiency and must be substituted with proper electronic system level
(ESL) methodologies [5].
Our research activity focused on the intersection among the var-
ious aspects of this new paradigm: the integration of a scalable NoC
communication infrastructure, a configurable microprocessor design,
an appropriate distributed programming model and a methodology for
system level modeling and exploration. All these aspects have been
taken into account and we show their effective integration by means
of a significant case of study. The most recent work on NoC design
and implementation is in [6] which discusses physical design aspects
in terms of automated floorplan, timing, and power issues. However,
system-level modeling aspects were not discussed nor the field-pro-
grammable gate array (FPGA) prototipation of a real-life application
as it is done in this work.
Section II discusses the ESL methodology used for system level
analysis and exploration. Its goal was to provide the required abstrac-
tion in order to obtain the necessary visibility of different blocks inter-
actions. It was then possible to analyze different candidate architectures
and communication schemes, each of them with different tradeoffs in
terms of power, performance, and cost. In Section III, we motivate and
discuss the design choices concerning the NoC topology and routing,
the switch and its interface to the processing element, as well as the
software abstraction of the network based on a set of lightweight appli-
cation programming interfaces (APIs) compliant with a subset of the
message passing interface (MPI) protocol more suited for an embedded
environment, the embedded MPI (eMPI) APIs. Section IV presents a
case-study, the NoCRay graphic accelerator: a parallel graphics ray
tracer rendering engine mapped first on FPGA for prototyping and then
implemented on a 90-nm standard-cell ASIC technology. Conclusions
are finally drawn in Section V.
II. SYSTEM LEVEL MODELING
The IP-XACT [7] standard has been used for system level modeling
of the MP-SoC NoC-based environment and for automatic RTL gener-
ation of the target architecture. This standard is tool-independent thus
1063-8210/$25.00 © 2009 IEEE
Authorized licensed use limited to: Mario Casu. Downloaded on July 27, 2009 at 10:16 from IEEE Xplore. Restrictions apply.