Distributing the Frontend for Temperature Reduction
Pedro Chaparro, Grigorios Magklis, José González and Antonio González
Intel Barcelona Research Center - Intel Labs – UPC
{pedro.chaparro.monferrer, grigoriosx.magklis, pepe.gonzalez, antoniox.gonzalez}@intel.com
Abstract
Due to increasing power densities, both on-chip
average and peak temperatures are fast becoming a
serious bottleneck in processor design. This is due to
the cost of removing the heat generated, and the
performance impact of dealing with thermal
emergencies. So far microarchitectural techniques to
control temperature have mainly focused on the
processor backend (in particular the execution units),
whereas the frontend has not received much attention.
However, as the temperature of the backend remains
controlled and the processor throughput increases, the
heat dissipated by the frontend becomes more
significant, and one of the major contributors to the
total average temperature.
This paper proposes and evaluates a distributed
frontend for clustered microarchitectures that is able
to reduce power density and temperature. First, a
distributed mechanism for renaming and committing
instructions is proposed. Second, a sub-banked trace
cache with a bank hopping mechanism is presented.
Finally, a method to improve the sub-banking is
proposed based on a biased mapping function to
distribute bank accesses to balance temperature.
1. Introduction
Power dissipation is one of the major hurdles in the
design of next-generation microarchitectures. Power
density is increasing in each generation due to the fact
that frequency and leakage currents are scaling up very
fast and their effect cannot be offset by decreasing the
supply voltage. Power density directly translates into
heat which must be removed from the processor die in
order to keep the silicon temperature below a certain
limit. The increase in power density makes the cost of
the cooling system grow and challenges the
performance benefits that can be obtained by the ever
growing transistor density. For instance, traditionally
the cooling system of a processor was designed to
support the worst case peak temperature. Because of
the growth of the cooling solution cost and some form
factor constraints (especially in mobile computers), the
cooling system is now designed for the common case
and a thermal emergency mechanism is in charge of
restoring the processor to its operating temperature.
This solution has been adopted because the processor
spends most of the time running at much lower
temperatures than the worst-case scenario. Whenever a
thermal emergency arises, a back-up mechanism to
cool down the chip is triggered. Such mechanisms have
a negative impact on performance.
The cost of the cooling system has been quantified
in the order of $1-3 or more per Watt when the average
power exceeds 40 Watts [4][14], which represents a
significant part of the total cost of the chip. The cost of
the heat removal system is especially important for
data centers where air conditioning is a main
contributor over the whole data center cost [22].
Furthermore, circuit reliability depends exponentially
upon operating temperature. Temperature variations
account for over 50% of electronic failures [28].
In order to reduce dynamic power dissipation, chip
designers rely on scaling down the supply voltage. To
counteract the negative effect of a lower supply voltage
on gate delay, the threshold voltage is also scaled down
along with the supply voltage. However, lowering the
threshold voltage has a significant impact on leakage
current due to the exponential relationship between
them. In fact, it is expected that within a few process
generations the contribution of leakage power to the
total power will be comparable to that of dynamic
power [4][9]. It is also important to note that leakage
power is exponentially dependent on temperature.
On the other hand, wire delays scale much slower
than gate delays [1][3][21] and pose a serious obstacle
to the scalability of superscalar processors. Clustered
microarchitectures are an effective organization to deal
with the problem of wire delays and complexity by
means of partitioning some of the processor resources
[6][11], such as the processor backend, and attempting
to minimize the use of global (slow) communications.
Clustered microprocessors achieve a significant
reduction of the backend temperature due to an
Proceedings of the 11th Int’l Symposium on High-Performance Computer Architecture (HPCA-11 2005)
1530-0897/05 $20.00 © 2005 IEEE