J. Parallel Distrib. Comput. 71 (2011) 651–663
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput.
journal homepage: www.elsevier.com/locate/jpdc
Characterizing the impact of process variation on 45 nm NoC-based CMPs
C. Hernández
*
, A. Roca, J. Flich, F. Silla, J. Duato
Grupo de Architecturas Paralelas, Departamento de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Espagne
article info
Article history:
Received 2 March 2010
Received in revised form
14 July 2010
Accepted 14 September 2010
Available online 26 September 2010
Keywords:
NoC (or Network-on-Chip)
CMP (or Chip multiprocessor)
Process variations
Process mapping
Router design
abstract
Current integration scales make possible to design chip multiprocessors with a large amount of cores
interconnected by a NoC. Unfortunately, they also bring process variation, posing a new burden to
processor manufacturers.
Regarding the NoC, variability causes that the delays of links and routers do not match those initially
established at design time. In this paper we analyze how variability affects the NoC by applying a new
variability model to 100 instances of an 8 × 8 mesh NoC synthesized using 45 nm technology. We also
show that GALS-based NoCs present communication bottlenecks due to the slower components of the
network, which cause congestion, thus reducing performance. This performance reduction finally affects
the applications being executed in the CMP because they may be mapped to slower areas of the chip. In this
paper we show that using a mapping algorithm that considers variability data may improve application
execution time up to 50%.
© 2010 Elsevier Inc. All rights reserved.
1. Introduction
The last developments by the main processor manufacturers
show that the number of cores per die in commercial chips
is continuously increasing with time. Effectively, the new Sun
Rainbow Falls featuring sixteen cores [33], the Magny-Cours
processor [8] by AMD, composed of twelve cores partitioned into
two Istanbul dies [40], and the new IBM Power7 [17] and Intel
Nehalem-EX [20] architectures that include eight cores, clearly
confirm this trend. In the near future, the amount of cores included
in a single die will probably be much larger as shown by the
prototype chip by Intel called Single-chip Cloud Computing [36].
This prototype includes 24 dual-core tiles based on the x86
architecture. The Polaris chip [14], another prototype also by Intel,
features 80 cores, although these cores are much simpler than
those in the Single-chip Cloud Computing prototype. On the other
hand, the Tile-Gx100 chip by Tilera [44], including 100 general-
purpose processor cores, is a design that also exemplifies the likely
future, where processors will be composed of several tens, or even
a few hundreds, of cores.
As the number of cores in a die increases, which will be the
common case as VLSI technologies continue leveraging larger
integration scales, it is not feasible to interconnect them by using a
bus or a crossbar due to scalability concerns, and thus a network-
on-chip (NoC) [24] must be used. Actually, both academia and
industry agree that NoCs are the best option for interconnecting
a high number of cores [21]. For example, a 1.2 TB/s ring is
*
Corresponding author.
E-mail addresses: carherlu@gap.upv.es (C. Hernández), anrope2@gap.upv.es
(A. Roca), jflich@disca.upv.es (J. Flich), fsilla@disca.upv.es (F. Silla),
jduato@disca.upv.es (J. Duato).
used in the recent Nehalem-EX processor [32]. In the case for the
48-core prototype by Intel mentioned above, the 24 dual-core tiles
are interconnected by a 6 × 4 mesh. This is also the case for the
Polaris and the Tile-Gx100 chips. The former features an 8 ×10 2D-
mesh, while the latter includes six parallel 10 × 10 bi-dimensional
meshes.
Current and future CMP chips are possible because of the
tremendous integration scales used. However, these integration
scales use such a small feature size that some degree of
unpredictability in manufactured devices arises due to process
variability, which is caused because current manufacturing
processes are no longer able to perfectly translate designs into
real devices. The main consequence of process variation is that
manufactured devices present delay characteristics that do not
exactly match the parameters established at the design phase. For
this reason, process variation arises as one of the most important
challenges to be tackled in new on-chip system architectures
starting from 65 nm manufacturing technologies down to 16 nm
ones [31].
The main sources of process variability have been recently gath-
ered in a detailed and comprehensive model for the characteriza-
tion of process variation in NoC links [12]. Results from this model
clearly show that the effect of variability on NoC links is not negli-
gible, although they are usually built on semi-global metal layers,
where metalizations are much wider than in lower metal layers
and, therefore, they should be noticeably less affected by variability
than active devices located on the silicon surface. Additionally, link
repeaters will also suffer from process variation. Variability in NoC
links causes that, for a given design, links in the network present
different delays, although they were initially designed to behave in
the same way.
0743-7315/$ – see front matter © 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.jpdc.2010.09.006