Voltage-Frequency Island Partitioning for GALS-based Networks-on-Chip Umit Y. Ogras, Radu Marculescu, Puru Choudhary, Diana Marculescu Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA, USA e-mail: {uogras,radum,puruc,dianam}@ece.cmu.edu ABSTRACT Due to high levels of integration and complexity, the design of multi-core SoCs has become increasingly challenging. In particu- lar, energy consumption and distributing a single global clock sig- nal throughout a chip have become major design bottlenecks. To deal with these issues, a globally asynchronous, locally synchro- nous (GALS) design is considered for achieving low power con- sumption and modular design. Such a design style fits nicely with the concept of voltage-frequency islands (VFIs) which has been recently introduced for achieving fine-grain system-level power management. This paper proposes a design methodology for parti- tioning an NoC architecture into multiple VFIs and assigning sup- ply and threshold voltage levels to each VFI. Simulation results show about 40% savings for a real video application and demon- strate the effectiveness of our approach in reducing the overall sys- tem energy consumption. The results and functional correctness are validated using an FPGA prototype for an NoC with multiple VFIs. Categories and Subject Descriptors B.7 [Hardware]: Integrated circuits. General Terms Algorithms, Design. Keywords Voltage-frequency island, GALS, Multi-processor systems, networks-on-chip. 1. INTRODUCTION Recognized by the International Roadmap for Semiconductors as the main bottlenecks in providing increased performance and platform capabilities, the on-chip communication and power management require a drastic departure from the classic design methodologies [1]. Networks-on-Chip (NoC) communication architectures have recently emerged as a promising solution for on-chip scalable communication beyond the capabilities of clas- sical bus-based and Point-to-Point (P2P) architectures [7][13]. Besides its advantages in terms of modularity, design re-use, and performance, the NoC approach offers a matchless platform for implementing the GALS paradigm [4] and makes clock distribu- tion and timing closure problems more manageable. Given that for complex systems built at 65nm and below it is almost impos- sible to move signals across the die in a single clock cycle or in a power efficient manner, it becomes obvious that a shift towards global on-chip asynchronous communication is needed. In addi- tion, a GALS-based design style fits nicely with the concept of VFIs, which has been recently introduced for achieving fine- grain system-level power management. The use of VFIs in the NoC context is likely to provide better power-performance trade- offs than its single voltage, single clock frequency counterpart, while taking advantage of the natural partitioning and mapping of applications onto the NoC platform. However, despite the huge potential for energy savings when using VFIs, the NoC design methodologies considered so far are limited to a single voltage-clock domain [2,10,15]. On the other hand, studies that do consider multiple VFIs assume that each module/core in the design belongs to a different island and different islands are con- nected by P2P links [8,17]. To address these challenges (and unlike existing work), this paper explores the design and optimization of novel NoC archi- tectures partitioned into multiple VFIs which rely on a GALS communication paradigm. In such a system, each voltage island can work at its own speed, while the communication across dif- ferent voltage islands is achieved through mixed clock/mixed voltage FIFOs (see Figure 1). This provides the flexibility to scale the frequency and voltage of various VFIs in order to mini- mize energy consumption. As a result, the advantages of both NoC and VFI design styles can be exploited simultaneously. The design of NoCs with multiple VFIs involves a number of critical steps. First, the granularity (i.e., the number of different VFIs) and chip partitioning into VFIs needs to be determined. While an NoC architecture where each processing/storage ele- ment (PE) constitutes a separate VFI exhibits the largest potential savings for energy consumption, this solution is very costly. Indeed, the associated design complexity increases due to the overhead in implementing the mixed-clock/mixed-voltage FIFOs Permission to make digital or hard copies of all or part of this work for per- sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior spe- cific permission and/or a fee. DAC 2007, June 4–8, 2007, San Diego, California, USA Copyright 2007 ACM 978-1-59593-627-1/07/0006…5.00 Figure 1 A sample 2D Mesh network with 3 VFIs. Communi- cation across different islands is achieved through mixed clock/mixed voltage FIFOs. 110 8.1 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2007, June 4–8, 2007, San Diego, California, USA. Copyright 2007 ACM 978-1-59593-627-1/07/0006 ...$5.00.