J. Parallel Distrib. Comput. 71 (2011) 651–663 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Characterizing the impact of process variation on 45 nm NoC-based CMPs C. Hernández * , A. Roca, J. Flich, F. Silla, J. Duato Grupo de Architecturas Paralelas, Departamento de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Espagne article info Article history: Received 2 March 2010 Received in revised form 14 July 2010 Accepted 14 September 2010 Available online 26 September 2010 Keywords: NoC (or Network-on-Chip) CMP (or Chip multiprocessor) Process variations Process mapping Router design abstract Current integration scales make possible to design chip multiprocessors with a large amount of cores interconnected by a NoC. Unfortunately, they also bring process variation, posing a new burden to processor manufacturers. Regarding the NoC, variability causes that the delays of links and routers do not match those initially established at design time. In this paper we analyze how variability affects the NoC by applying a new variability model to 100 instances of an 8 × 8 mesh NoC synthesized using 45 nm technology. We also show that GALS-based NoCs present communication bottlenecks due to the slower components of the network, which cause congestion, thus reducing performance. This performance reduction finally affects the applications being executed in the CMP because they may be mapped to slower areas of the chip. In this paper we show that using a mapping algorithm that considers variability data may improve application execution time up to 50%. © 2010 Elsevier Inc. All rights reserved. 1. Introduction The last developments by the main processor manufacturers show that the number of cores per die in commercial chips is continuously increasing with time. Effectively, the new Sun Rainbow Falls featuring sixteen cores [33], the Magny-Cours processor [8] by AMD, composed of twelve cores partitioned into two Istanbul dies [40], and the new IBM Power7 [17] and Intel Nehalem-EX [20] architectures that include eight cores, clearly confirm this trend. In the near future, the amount of cores included in a single die will probably be much larger as shown by the prototype chip by Intel called Single-chip Cloud Computing [36]. This prototype includes 24 dual-core tiles based on the x86 architecture. The Polaris chip [14], another prototype also by Intel, features 80 cores, although these cores are much simpler than those in the Single-chip Cloud Computing prototype. On the other hand, the Tile-Gx100 chip by Tilera [44], including 100 general- purpose processor cores, is a design that also exemplifies the likely future, where processors will be composed of several tens, or even a few hundreds, of cores. As the number of cores in a die increases, which will be the common case as VLSI technologies continue leveraging larger integration scales, it is not feasible to interconnect them by using a bus or a crossbar due to scalability concerns, and thus a network- on-chip (NoC) [24] must be used. Actually, both academia and industry agree that NoCs are the best option for interconnecting a high number of cores [21]. For example, a 1.2 TB/s ring is * Corresponding author. E-mail addresses: carherlu@gap.upv.es (C. Hernández), anrope2@gap.upv.es (A. Roca), jflich@disca.upv.es (J. Flich), fsilla@disca.upv.es (F. Silla), jduato@disca.upv.es (J. Duato). used in the recent Nehalem-EX processor [32]. In the case for the 48-core prototype by Intel mentioned above, the 24 dual-core tiles are interconnected by a 6 × 4 mesh. This is also the case for the Polaris and the Tile-Gx100 chips. The former features an 8 ×10 2D- mesh, while the latter includes six parallel 10 × 10 bi-dimensional meshes. Current and future CMP chips are possible because of the tremendous integration scales used. However, these integration scales use such a small feature size that some degree of unpredictability in manufactured devices arises due to process variability, which is caused because current manufacturing processes are no longer able to perfectly translate designs into real devices. The main consequence of process variation is that manufactured devices present delay characteristics that do not exactly match the parameters established at the design phase. For this reason, process variation arises as one of the most important challenges to be tackled in new on-chip system architectures starting from 65 nm manufacturing technologies down to 16 nm ones [31]. The main sources of process variability have been recently gath- ered in a detailed and comprehensive model for the characteriza- tion of process variation in NoC links [12]. Results from this model clearly show that the effect of variability on NoC links is not negli- gible, although they are usually built on semi-global metal layers, where metalizations are much wider than in lower metal layers and, therefore, they should be noticeably less affected by variability than active devices located on the silicon surface. Additionally, link repeaters will also suffer from process variation. Variability in NoC links causes that, for a given design, links in the network present different delays, although they were initially designed to behave in the same way. 0743-7315/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2010.09.006