Enabling Adaptability Through Elastic Clocks Emre Tuncer Elastix Corporation Los Gatos, CA, USA emre@elastix-corp.com Jordi Cortadella Universitat Politècnica de Catalunya Barcelona, Spain Luciano Lavagno Politecnico di Torino Torino, Italy ABSTRACT Power and performance benefits of scaling are lost to worst case margins as uncertainty of device characteristics is increasing. Adaptive techniques can dynamically adjust the margins required to tolerate variability and recover a significant part of the benefits lost due to worst-case conditions. Additionally, the stringent timing requirements for the synthesis of low-skew clock trees involve higher power consumption, and limit the adaptability to varying operating conditions. This paper introduces an elastic clocking scheme as an adaptive technique to confront variability and provide substantial power savings by dynamically adjusting to operating conditions. The synthesis and sign-off analysis of the elastic clocks is fully automated. Changes to the design flow and sign-off analysis of elastic clocks are addressed by automation of design flow support. Categories and Subject Descriptors B.8.2 Performance Analysis and Design Aids. General Terms Design, Reliability, Economics. Keywords Adaptive voltage scaling, desynchronization, GALS, low power design. 1. INTRODUCTION Increasing process variability and decreasing operating voltage, as feature sizes scale down, reduce potential power-performance gains. Statistical design methods can reduce overdesign due to unrealistic worst-case assumptions [1], [6]. Large volume parts can be binned and sold at different price points to recover some portion of performance versus yield trade-off. However, binning is not applicable to ASICs due to commercial reasons, and statistical timing analysis does not address the margins needed for environmental variations such as temperature and voltage changes. Ad-hoc recovery of design margins is common place in today’s world. Off-the-shelf processor parts can be over-clocked well beyond their rated speeds by employing sophisticated cooling. By the same token, reducing the supply voltage to run a fast part at the specified frequency can save energy and power, which are becoming a primary concern for all electronic systems. Adaptive Voltage Scaling (AVS) provides this capability by sensing on- chip conditions dynamically and reducing or increasing the supply voltage to run the part at the required speed [2]. The power gains in AVS, however, are limited by the ability of predicting data path delays across the variation space [3]. AVS addresses static (process) or slowly varying (temperature, and to some degree aging) variations. The response time of the voltage regulation loop is usually hundreds or thousands of clock cycles. Cycle-to-cycle variations, such as IR-drop due to dynamic loads, must be handled by increasing the margins, thus reducing the achievable power gains. Fine-grained application of AVS to individual cores or blocks in an SOC further improves power gains, based on load and performance requirements. However, the clock skew due to voltage domain crossing quickly becomes the limiting factor for performance, and increases the hold time fixing overhead. A solution to overcome this limitation is the adoption of asynchronous communication techniques between blocks [4]. The GALS (Globally Asynchronous Locally Synchronous) approach provides the flexibility to have each block driven by its own separate clock, and possibly supply voltage, while still enabling safe communication with other blocks. The main drawback of a GALS approach is the synchronization latency required to cross different clock domains, which may have a significant impact on the performance of the system. Elastic clocks, where the period is dynamically adjusted to data path delays at the current operating conditions, provide the ability to minimize AVS margins due to IR-drop and clock skew. They also reduce latency in inter-block communication due to the asynchronous nature of the local clock controller protocol. Elastic clocks are implemented in a synchronous design flow through the desynchronization process. 2. DESYNCHRONIZATION The separation between functionality and performance has always been a cornerstone of digital circuit design, enabling the development of tools that support functional specification, using snthesizable Verilog or VHDL; logic synthesis; physical design; equivalence checking and static timing analysis. Even testing schemes based on coupling of full stuck-at functional testing with limited at-speed performance testing, benefit from this separation. 2.2 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC’09, July 26-31, 2009, San Francisco, California, USA Copyright 2009 ACM 978-1-60558-497-3/09/07.....5.00 8 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC’09, July 26-31, 2009, San Francisco, California, USA Copyright 2009 ACM 978-1-60558-497-3/09/07....10.00 Authorized licensed use limited to: UNIVERSITAT POLIT?CNICA DE CATALUNYA. Downloaded on October 16, 2009 at 02:37 from IEEE Xplore. Restrictions apply.