IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 4, APRIL 2017 1271 A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC–DC Converter Sae Kyu Lee, Student Member, IEEE, Tao Tong, Student Member, IEEE, Xuan Zhang, Member, IEEE, David Brooks, Fellow, IEEE , and Gu-Yeon Wei, Member, IEEE Abstract— This paper presents a 16-core voltage-stacked system with adaptive frequency clocking (AFClk) and a fully integrated voltage regulator that demonstrates efficient on-chip power delivery for multicore systems. Voltage stacking alleviates power delivery inefficiencies due to off-chip parasitics but adds complexity to combat internal voltage noise. To address the corresponding issue of internal voltage noise, the system utilizes an AFClk scheme with an efficient switched-capacitor dc–dc converter to mitigate noise on the stack layers and to improve system performance and efficiency. Experimental results demonstrate robust voltage noise mitigation as well as the potential of voltage stacking as a highly efficient power delivery scheme. This paper also illustrates that augmenting the hardware techniques with intelligent workload allocation that exploits the inherent properties of voltage stacking can preemptively reduce the interlayer activity mismatch and improve system efficiency. Index Terms—Adaptive frequency clocking (AFClk), dc–dc converter, multicore, power delivery, voltage noise, voltage stacking. I. I NTRODUCTION E FFICIENT power delivery is a critical design target for modern computing systems from high-performance servers to mobile devices. Continued decreases in supply voltages and aggressive power reduction techniques (e.g., clock and power gating) under a fixed power budget have led to increases in average current draw and worsening current transients, forcing stringent requirements on the power deliv- ery impedance. Today’s 100-W high-performance processors operate under 1 V, draw excess of 100 A, and require <1-m impedance for 10% voltage noise margin, which is extremely challenging to achieve. Furthermore, significant I 2 R power Manuscript received June 1, 2016; revised October 2, 2016; accepted November 3, 2016. Date of publication December 19, 2016; date of current version March 20, 2017. This work was supported in part by NSF under Grant CCF-0903437 and Grant CCF-1218298, and in part by DARPA under Grant HR0011-13-C-0022. This paper was recommended by Associate Editor I. Savidis. S. K. Lee, T. Tong, D. Brooks, and G.-Y. Wei are with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 USA (e-mail: saekyu@eecs.harvard.edu; taotong@seas. harvard.edu; dbrooks@eecs.harvard.edu; guyeon@eecs.harvard.edu). X. Zhang was with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 USA. He is now with Washington University in St. Louis, St. Louis, MO 63130 USA (e-mail: xuan.zhang@wustl.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2016.2633805 loss in the off-chip parasitic resistance of the power delivery network can greatly degrade the overall system efficiency, while electromigration due to high current levels is a concern. Exacerbating the issue, the off-chip components ostensibly have not scaled, contrary to the ever-decreasing power delivery impedance requirements needed to keep up with the high current demands of modern computing systems. Voltage stacking is an on-chip power delivery solution that delivers a high voltage to the chip by vertically stacking voltage domains in series and recycling charge through the stacked layers, thereby reducing the overall chip current demands [1]–[11]. For the same chip power, an n-way stacked system reduces the current draw of the chip proportionally by n, which reduces IR drop by factor of n,I 2 R power loss by n 2 , and alleviates the off-chip impedance requirements for the same voltage noise margin. It also obviates a high step-down off-chip dc–dc converter, which improves off-chip regulator efficiency. There has been growing interest in integrating dc–dc con- verters on-chip to perform voltage conversion from the high input voltage levels provided to the chip [12]–[15]. However, such integration usually suffers from inferior conversion effi- ciency due in part to poor quality inductors and capacitors available on-chip. By stacking voltage domains and obviating the explicit voltage conversion stage, voltage stacking achieves high efficiency power delivery. Ideally, if the power consump- tion of all stacked layers perfectly matches, the layer voltages evenly subdivide the high input voltage, and voltage stacking achieves optimal intrinsic step-down voltage conversion with no loss. In practice, interlayer switching activity mismatch exists and results in interlayer voltage noise due to the series- connected nature of voltage stacking; wherein, this suscepti- bility to voltage noise negatively impacts system performance and energy efficiency, and poses system reliability concerns. To address the corresponding voltage noise issue associated with voltage stacking, this paper presents a 16-core four-way voltage-stacked test chip implemented in TSMC’s 40 G process that integrates industry grade microprocessor cores with a multioutput integrated voltage regulator (IVR) and implements adaptive frequency clocking (AFClk) to mitigate the impact of voltage noise. The IVR provides minimum voltage guarantees while AFClk allows the cores to operate efficiently with minimal margin. With voltage stacking, 1063-8210 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.