Copyright © 2015 American Scientific Publishers All rights reserved Printed in the United States of America Journal of Low Power Electronics Vol. 11, 1–21, 2015 Runtime Leakage Power Reduction Using Loop Unrolling and Fine Grained Power Gating Sumanta Pyne and Ajit Pal Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India (Received: 14 October 2014; Accepted: 14 January 2015) The present work introduces a compilation technique to reduce runtime leakage power of functional units of a processor by combining loop unrolling with power gating. The instructions in the unrolled loop are scheduled to provide opportunities for power gating the functional units which are not used for a considerable amount of time. An algorithm that saves maximum leakage energy without perfor- mance loss due to execution of power gating instructions has been introduced. The algorithm does loop unrolling, scheduling of instructions and finally insert power gating instructions. The present work is explained using two illustrative examples, one without loop-carried dependence and the other with loop-carried dependence. It is observed that the number of clock cycles taken by the power gating instructions is less than or equal to the number of clock cycles saved by loop unrolling. This results in 23–64% reduction of the total energy consumed by the benchmark programs without any degradation of performance. Keywords: Clustering of Instructions, Fine Grained Power Gating, Grouping of Instructions, Inter-Iteration Data Dependence, Leakage Power, Loop Unrolling, Power Gating Instructions. 1. INTRODUCTION The rapid growth in power consumption of wide range of computing devices right from servers to hand-held embedded devices has led to the design of energy-efficient hardware and software. Early research on low power com- puting concentrated on reduction of dynamic and switch- ing power. However, in the deep submicron era, the leakage power consumption dominates the total power consumption. 1–3 In 250 nm technology dynamic power was 90% of the total power dissipation. 45 But below 70 nm technology leakage power dominates the total power dissipation. 4–6 For 25 nm technology the leak- age power is almost 80% of the total power dissipa- tion. The present work is a software based technique to reduce runtime leakage power of functional units. Loop unrolling is combined with power gating to achieve sig- nificant reduction in the total power consumption. Pro- grammers and/or compilers can exploit this idea to reduce leakage power when loops containing expressions requir- ing multiple functional units and having a possibility of loop unrolling are encountered. Author to whom correspondence should be addressed. Emails: sumantapyne@gmail.com, spyne@cse.iitkgp.ernet.in Loop unrolling (LU) reduces the number of branch instructions to be executed, thereby saving time and energy. 7 But, loop unrolling has several disadvantages. As the loop unrolling factor (uf) increases, the size of the code within the body of the unrolled loops increases. This can cause an increase in instruction cache misses, which may degrade the performance, and increase more energy consumption. The increased code size also increases the possibility of more register usage (register pressure) in a single iteration to store temporary variables, which may degrade performance, and consume more energy. So, the uf has to be judiciously decided to optimize total power consumption. Power gating (PG) 8 is a technique used in integrated circuit design to reduce power con-sumption, by shutting off the blocks that are not in use, thus reducing stand-by or leakage power. However, it increases time delays, as power gated modes have to be safely entered and exited. Architectural trade-offs exist between designing for the amount of leakage power saving in low power modes and the energy dissipation to enter and exit the low power modes. Shutting down of the blocks can be initiated either by software or by hardware. J. Low Power Electron. 2015, Vol. 11, No. 1 1546-1998/2015/11/001/021 doi:10.1166/jolpe.2015.1361 1