846 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 4, APRIL 2007
The 65-nm 16-MB Shared On-Die L3 Cache for the
Dual-Core Intel Xeon Processor 7100 Series
Jonathan Chang, Senior Member, IEEE, Ming Huang, Jonathan Shoemaker, John Benoit, Member, IEEE,
Szu-Liang Chen, Wei Chen, Siufu Chiu, Raghuraman Ganesan, Gloria Leong, Venkata Lukka,
Stefan Rusu, Fellow, IEEE, and Durgesh Srivastava
Abstract—The 16-way set associative, single-ported 16-MB
cache for the Dual-Core Intel Xeon Processor 7100 Series uses a
0.624 m cell in a 65-nm 8-metal technology. Low power tech-
niques are implemented in the L3 cache to minimize both leakage
and dynamic power. Sleep transistors are used in the SRAM array
and peripherals, reducing the cache leakage by more than 2X.
Only 0.8% of the cache is powered up for a cache access. Dynamic
cache line disable (Intel Cache Safe Technology) with a history
buffer protects the cache from latent defects and infant mortality
failures.
Index Terms—Circuit design, computer architecture, manufac-
turability, microprocessor, on-die cache, power reduction, relia-
bility, test.
I. INTRODUCTION
T
HE Dual-Core Intel Xeon Processor 7100 with up to
16-MB unified L3 cache is implemented in a 65-nm
process technology with eight copper interconnect layers [1],
[2]. Fig. 1 shows the die photo of the processor. It consists of
two cores, each with a 1-MB L2 cache. The processor has a
total of 1.3 billion transistors, while each core has over 100
million transistors. The processor runs at 3.5 GHz at 1.25 V.
It supports 150 W and 95 W thermal design power. The L3
cache and the associated logic have a separate power supply
from the cores, PLL, and I/O. Fig. 2 shows the four voltage
domains of the processor. The front side bus can run at 800 or
667 MT/s on a 3 load configuration. Both L3 and L2 use the
same 0.624- m bit cell. Sleep transistors were designed in
the SRAM arrays and their peripherals to achieve 0.75 W/MB
average power, while maintaining the cache content all the time
[3]. The overall leakage power reduction is more than 2X and
confirmed by silicon measurements. Long channel length de-
vices were used wherever possible to further reduce the leakage
power consumption. A shutdown option is implemented in the
SRAM arrays to minimize the leakage power for the inactive
sub-arrays. Aggressive clock gating, fine-grained sleep reso-
lution, and wake-up counters were implemented to minimize
the dynamic power. Column redundancy is available in data
and tag arrays. Block redundancy is available through cache
sizing. Intel Cache Safe Technology, formerly know as Pellston
technology, is used to keep track of the random ECC event of
each cache line and disable the cache lines susceptible to latent
Manuscript received August 25, 2006; revised December 19, 2006.
The authors are with Intel Corporation, Santa Clara, CA 95052 USA (e-mail:
jonathan.chang@intel.com).
Digital Object Identifier 10.1109/JSSC.2007.892185
Fig. 1. Die photo.
Fig. 2. Voltage domains.
defects and infant mortality [4]. Extensive test solutions are
available to ensure manufacturability.
II. CACHE ORGANIZATION AND FLOORPLAN
The logical cache size is 16 MB. It is 19 MB with ECC and
redundancy. The L3 cache is a 16-set, 16-way set-associative
cache, organized as shown in Fig. 3. The cache line size is 64
bytes, which is sent in two chunks on the data buses. Each chunk
has 256 data bits, 32 ECC bits, and 2 redundancy bits. Each
physical address is 40 bits. Cache sizing was done through set
reduction. Sets can be configured to 16 K, 8 K and 4 K. Table I
summarizes the cache organization of three major configura-
tions: 16 M, 8 M, and 4 M. The set associativity stays at 16
for all three configurations. Set reduction is used to achieve the
target cache size. Tag array and the associated datapath and con-
trol logic are built to support the largest tag width, coming from
4 M configuration.
The floorplan is built with wrap-around style. Fig. 4 shows the
floorplan and an example of the data grouping. The data cache
is constructed with 256 regular sub-arrays and 32 redundancy
sub-arrays. A regular sub-array is a 64 KB sub-array, storing
32 bits. A redundancy sub-array is a 68 KB sub-array, storing
0018-9200/$25.00 © 2007 IEEE