Compiler-Based Optimizations Impact on Embedded Software Power Consumption Mostafa E. A. Ibrahim 1+2 , Markus Rupp 1 , and S. E.-D. Habib 2 1 Institute of Communications and RF Engineering-Vienna University of Technology, Austria 2 Electronics and Communication Department Faculty of Engineering-Cairo University, Egypt Email: {mhalas,mrupp}@nt.tuwien.ac.at, seraged@ieee.org Abstract— Compilers traditionally are not exposed to the energy details of the processor. In this paper, we present a quantitative study wherein we examine the influence of the global performance optimizations -o0 to -o3, of the code composer studio C/C++ compiler, on the energy and power consumption. The results show that the most aggressive performance optimization option -o3 reduce the execution time, on average, by 95%, while it increases the power consumption by 25%. Moreover, we inspect the optimizations effect on some other execution characteristics, such as the memory references and the data cache miss rate. The results show that the memory references decreases by 94%, while the IPC increases by 250% and consequently lead to the consumed power increase. I. I NTRODUCTION In recent years, reducing power dissipation and energy consumption of a program have become optimization goals in their own right, no longer considered a side-effect of tradi- tional performance optimizations which mainly try to reduce program execution times. Power and energy optimizations can be implemented in hardware through circuit design, and by the compiler through compile-time analysis, code reshaping, and hints to the operating system. Compilers traditionally are not exposed to the energy details of the processor. Current compiler optimizations are tuned primarily for performance and code size. Hence, it is important to evaluate how these optimization options influence power and energy consumption within the processor while running a software kernel. In this paper, we present a quantitative study wherein we examine the effect of the global optimizations levels -o0 to -o3 of the compiler on the energy and power consumption of the targeted processor. The targeted processor in our ex- periments is the fixed-point VLIW TMS320C6416T (for the rest of the paper it is referred to as C6416T for brevity) DSP from Texas Instruments. In our experiments we use the Code Composer Studio (CCS), the Texas Instruments C/C++ compiler, to produce the code binaries. This compiler contains many optimizations, including those that specifically target the capabilities and features of the C6416T. First, we evaluate the effect of the global compiler opti- mization options on the energy and power consumption of the targeted processor. Second, we analyze the effect on other This work has been funded by the Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms, as well as the comet funded the K-project: Embedded Computer Vision. performance measures such as memory references, the cache miss rate, the instructions per cycle (IPC) and the CPU stall cycles. The paper describes prior research related to this work in Section II and presents a general overview of the experimental platform in Section III. The results of invoking various global compiler optimizations and their effect on the power and energy are shown in Section IV. Finally, the conclusions are drawn in Section V. II. PREVIOUS WORK In recent years some attempts to understand the scope of compiler optimizations, from the perspective of power dissipation and energy consumption, of programmable pro- cessors have been introduced. Tiwari et al. [1] presented an instruction level power model for a Fujistu 3.3v, 40MHz DSP. Moreover, the effect of two architectural features (dual- memory accesses, and packing of instructions into pairs) on the energy consumption has been illustrated. With the help of a cycle-accurate energy simulator (Simple- Power), a source-to-source code translator, and a number of benchmark codes, Kandemir et al. [2] studied the influence of five high-level compiler optimizations ,such as loop unrolling and loop fusion, on energy consumption. Valluri et al. [3] provided an evaluation of some general and specific optimizations in terms of the power/energy consump- tion of the Alpha processor while running some SpecInt95 and SpecFp95 benchmarks. The processor in his work was simulated by means of Wattch (A frame work for analyzing processor power consumption at architectural-level) [4]. Chakrapani et al. [5] also presented a study of the effect of compiler optimization on the energy usage of an embedded processor. Their work targets an ARM embedded core and they use an RTL level model along with Synopsys Power Compiler to estimate power. Seng et al. [6] revised the effect of the Intel compiler general and specific optimizations, for energy and power consumption, for a Pentium 4 processor running some benchmarks extracted from Spec2000. Zafar et al. [7] examined the effect of loop unrolling factor, grafting depth and blocking factor on the energy and perfor- mance for the Philips Nexperia media processor PNX1302. But, they interchangeably use the term energy and power for the same meaning. Hence the improvement in energy is directly related to the performance enhancement. Copyright 2001 NEWCAS-TRISA Published in the Proceedings of the Joint Conference NEWCAS-TRAISA, June 28- July 1st, 2009, Toulouse, France.