384 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 Special Section Briefs A Case Study for NoC-Based Homogeneous MPSoC Architectures Sergio V. Tota, Mario R. Casu, Massimo Ruo Roch, Luca Macchiarulo, and Maurizio Zamboni Abstract—The many-core design paradigm requires ﬂexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a com- plete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been suc- cessfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with ﬁeld-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processor. Index Terms—Multiprocessor systems-on-chip (MP-SoC), network-on- chip (NoC). I. INTRODUCTION T HE unrelented trend toward higher computation performance had led so far to an increase of the complexity and the number of the functional units of single monolithic microprocessors. Recently, this trend has started to slowdown even if the number of transistors is expected to continue to double every three years [1]. Power-thermal issues as well as design complexity have begun to limit the perfor- mance growth-rate compared with the increasing number of transis- tors available in a single die [2]. One way to cope with this produc- tivity gap is the “tile-design” concept which underlies a simple yet ef- fective paradigm: parallelization through replication of many identical blocks placed each in a tile of a regular array fabric. Instead of fo- cusing on improving the complexity of a single block, the solution aims at delivering performance through several replicas of the same basic blocks. This approach has the major positive consequence of making systems design a matter of instantiation capability instead of archi- tecture complexity, an objective which has to be pursued through in- novative scalable hardware/software solutions. The resulting architec- ture can be certainly seen as an on-chip multiprocessor system. There- fore, we will refer to such system as a “homogeneous” multiprocessor systems-on-chip (MP-SoC), although the recent literature seems to re- serve the MP-SoC acronym to the case of “heterogeneous” processors. MP-SoC design is a multidisciplinary research activity that encom- passes on-chip communication infrastructures, microprocessor archi- tectures, programming models, codesign/cosimulation ﬂows and ﬂex- ible methodologies for system level modeling and exploration. Manuscript received December 10, 2007; revised April 04, 2008. First pub- lished February 06, 2009; current version published February 19, 2009. S. V. Tota, M. R. Casu, M. R. Roch, and M. Zamboni are with the Dipar- timento di Elettronica, Politecnico di Torino, I-10129 Torino, Italy (e-mail: sergio.tota@polito.it; mario.casu@polito.it; massimo.ruoroch@polito.it; mau- rizio.zamboni@polito.it). L. Macchiarulo is with the Department of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 USA (e-mail: lucam@hawaii.edu). Digital Object Identiﬁer 10.1109/TVLSI.2008.2011239 Network-on-chip (NoC) is seen as the interconnection methodology for such systems [3]. The motivations for this choice are the better scalability of performance with the increasing number of processing elements (PEs) compared to standard on-chip busses, the high regu- larity which improves layout and particularly the allocation of wiring resources and the layered design approach which enables tackling the SoC complexity. The computation is usually performed by a microprocessor. Even if it is always possible to implement hard-wired PEs, a programmable ar- chitecture gives the required ﬂexibility to adapt and reuse the system in different scenarios. The possibility of reusing the same chipset in different devices is the only solution to face the growing costs of R&D as well as of mask-sets [4]. The performance of latest application-spe- ciﬁc integrated processors (ASIPs) together with the high availability of transistors is making the design of custom hard-wired logic always less convenient (time-to-market, respin risks). Furthermore, current ASIPs offer a high level of conﬁgurability. It is now possible to optimize the code execution adding hardware support for frequent/computation-in- tensive operations. Mastering the complexity of an MP-SoC requires new approaches to replace the standard design ﬂow. Nowadays, the register transfer level (RTL)-to-netlist methodology has reached its maximum of ef- ﬁciency and must be substituted with proper electronic system level (ESL) methodologies [5]. Our research activity focused on the intersection among the var- ious aspects of this new paradigm: the integration of a scalable NoC communication infrastructure, a conﬁgurable microprocessor design, an appropriate distributed programming model and a methodology for system level modeling and exploration. All these aspects have been taken into account and we show their effective integration by means of a signiﬁcant case of study. The most recent work on NoC design and implementation is in [6] which discusses physical design aspects in terms of automated ﬂoorplan, timing, and power issues. However, system-level modeling aspects were not discussed nor the ﬁeld-pro- grammable gate array (FPGA) prototipation of a real-life application as it is done in this work. Section II discusses the ESL methodology used for system level analysis and exploration. Its goal was to provide the required abstrac- tion in order to obtain the necessary visibility of different blocks inter- actions. It was then possible to analyze different candidate architectures and communication schemes, each of them with different tradeoffs in terms of power, performance, and cost. In Section III, we motivate and discuss the design choices concerning the NoC topology and routing, the switch and its interface to the processing element, as well as the software abstraction of the network based on a set of lightweight appli- cation programming interfaces (APIs) compliant with a subset of the message passing interface (MPI) protocol more suited for an embedded environment, the embedded MPI (eMPI) APIs. Section IV presents a case-study, the NoCRay graphic accelerator: a parallel graphics ray tracer rendering engine mapped ﬁrst on FPGA for prototyping and then implemented on a 90-nm standard-cell ASIC technology. Conclusions are ﬁnally drawn in Section V. II. SYSTEM LEVEL MODELING The IP-XACT [7] standard has been used for system level modeling of the MP-SoC NoC-based environment and for automatic RTL gener- ation of the target architecture. This standard is tool-independent thus 1063-8210/$25.00 © 2009 IEEE Authorized licensed use limited to: Mario Casu. Downloaded on July 27, 2009 at 10:16 from IEEE Xplore. Restrictions apply.