846 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 4, APRIL 2007 The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series Jonathan Chang, Senior Member, IEEE, Ming Huang, Jonathan Shoemaker, John Benoit, Member, IEEE, Szu-Liang Chen, Wei Chen, Siufu Chiu, Raghuraman Ganesan, Gloria Leong, Venkata Lukka, Stefan Rusu, Fellow, IEEE, and Durgesh Srivastava Abstract—The 16-way set associative, single-ported 16-MB cache for the Dual-Core Intel Xeon Processor 7100 Series uses a 0.624 m cell in a 65-nm 8-metal technology. Low power tech- niques are implemented in the L3 cache to minimize both leakage and dynamic power. Sleep transistors are used in the SRAM array and peripherals, reducing the cache leakage by more than 2X. Only 0.8% of the cache is powered up for a cache access. Dynamic cache line disable (Intel Cache Safe Technology) with a history buffer protects the cache from latent defects and infant mortality failures. Index Terms—Circuit design, computer architecture, manufac- turability, microprocessor, on-die cache, power reduction, relia- bility, test. I. INTRODUCTION T HE Dual-Core Intel Xeon Processor 7100 with up to 16-MB unified L3 cache is implemented in a 65-nm process technology with eight copper interconnect layers [1], [2]. Fig. 1 shows the die photo of the processor. It consists of two cores, each with a 1-MB L2 cache. The processor has a total of 1.3 billion transistors, while each core has over 100 million transistors. The processor runs at 3.5 GHz at 1.25 V. It supports 150 W and 95 W thermal design power. The L3 cache and the associated logic have a separate power supply from the cores, PLL, and I/O. Fig. 2 shows the four voltage domains of the processor. The front side bus can run at 800 or 667 MT/s on a 3 load configuration. Both L3 and L2 use the same 0.624- m bit cell. Sleep transistors were designed in the SRAM arrays and their peripherals to achieve 0.75 W/MB average power, while maintaining the cache content all the time [3]. The overall leakage power reduction is more than 2X and confirmed by silicon measurements. Long channel length de- vices were used wherever possible to further reduce the leakage power consumption. A shutdown option is implemented in the SRAM arrays to minimize the leakage power for the inactive sub-arrays. Aggressive clock gating, fine-grained sleep reso- lution, and wake-up counters were implemented to minimize the dynamic power. Column redundancy is available in data and tag arrays. Block redundancy is available through cache sizing. Intel Cache Safe Technology, formerly know as Pellston technology, is used to keep track of the random ECC event of each cache line and disable the cache lines susceptible to latent Manuscript received August 25, 2006; revised December 19, 2006. The authors are with Intel Corporation, Santa Clara, CA 95052 USA (e-mail: jonathan.chang@intel.com). Digital Object Identifier 10.1109/JSSC.2007.892185 Fig. 1. Die photo. Fig. 2. Voltage domains. defects and infant mortality [4]. Extensive test solutions are available to ensure manufacturability. II. CACHE ORGANIZATION AND FLOORPLAN The logical cache size is 16 MB. It is 19 MB with ECC and redundancy. The L3 cache is a 16-set, 16-way set-associative cache, organized as shown in Fig. 3. The cache line size is 64 bytes, which is sent in two chunks on the data buses. Each chunk has 256 data bits, 32 ECC bits, and 2 redundancy bits. Each physical address is 40 bits. Cache sizing was done through set reduction. Sets can be configured to 16 K, 8 K and 4 K. Table I summarizes the cache organization of three major configura- tions: 16 M, 8 M, and 4 M. The set associativity stays at 16 for all three configurations. Set reduction is used to achieve the target cache size. Tag array and the associated datapath and con- trol logic are built to support the largest tag width, coming from 4 M configuration. The floorplan is built with wrap-around style. Fig. 4 shows the floorplan and an example of the data grouping. The data cache is constructed with 256 regular sub-arrays and 32 redundancy sub-arrays. A regular sub-array is a 64 KB sub-array, storing 32 bits. A redundancy sub-array is a 68 KB sub-array, storing 0018-9200/$25.00 © 2007 IEEE