An Incremental Methodology for Energy Measurement and Modeling Abdelhafid Mazouz * Intel Corporation Champaign, IL, USA first.last@gmail.com David C. Wong Intel Corporation Austin, TX, USA first.c.last.@intel.com David Kuck Intel Corporation Austin, TX, USA first.last@intel.com William Jalby UVSQ/ECR Versailles, France first.last@uvsq.fr ABSTRACT This paper presents an empirical approach to measuring and modeling the energy consumption of multicore processors. The modeling approach allows us to find a breakdown of the energy consumption among a set of key hardware compo- nents, also called HW nodes. We explicitly model the front- end and the back-end in terms of the number of instructions executed. We also model the L1, L2 and L3 caches. Fur- thermore, we explicitly model the static and dynamic energy consumed by the the uncore and core components. From a software perspective, our methodology allows us to correlate energy to the executed code, which helps find opportunities for code optimization and tuning. We use binary analysis and hardware counters for per- formance characterization. Although, we use the on-chip counters (RAPL) for energy measurement, our methodology does not rely on a specific method for energy measurement. Thus, it is portable and easy to deploy in various computing environments. We validate our energy model using two Intel processors with a set of HPC codelets, where data sizes are varied to come from the L1, L2 and L3 caches and show 3% average modeling error. We present a comprehensive analy- sis and show energy consumption differences between kernels and relate those differences to the algorithms that are im- plemented. Finally, we discuss how vectorization leads to energy savings compared to non-vectorized codes. CCS Concepts •Computing methodologies → Modeling methodolo- gies; •Hardware → Power and energy; •Computer systems organization → Multicore architectures; * Abdelhafid Mazouz contributed to this work as an Intel em- ployee. He is now at Bull/Atos. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ICPE’17, April 22-26, 2017, L’Aquila, Italy c 2017 ACM. ISBN 978-1-4503-4404-3/17/04. . . $15.00 DOI: http://dx.doi.org/10.1145/3030207.3030224 Keywords Energy modeling; Performance evaluation, RAPL 1. INTRODUCTION Power or energy models are widely used in the context of dynamic power management and DVFS controllers [1, 3, 22]. They can serve as a tool to dynamically find hardware (HW) parameters that are best suited for a given workload in a computing system. Accurate power and energy modeling of HW are also important components for software (SW) de- velopment tools and HW/SW codesign tools [11, 12]. Gen- erally, one needs a model in which power or energy is related to the amount of each HW resource used in a given compu- tation. We define HW resources (called HW nodes) as HW components that can enhance the overall performance. This allows expression of system or subsystem power or energy as the sum of individual HW node contributions, so for each computation the contributions of each node can be under- stood. This paper gives a general procedure for generating such models by iteratively refining high-level measurements down to lower-level HW details, and in the limit to individ- ual operations and instructions. Our methodology is portable, as we rely on HW coun- ters for performance and energy measurement, which are available on most modern general purpose processors. In this paper, we apply our methodology to estimate energy consumption for multicore processors. We use the Running Average Power Limit [5, 10] (RAPL) interfaces for energy measurement and estimation. As we focus on Intel micro- processors, such HW interfaces are provided to estimate the core and uncore energy. Although errors are observed in such HW estimates [9, 16], it allows us to deploy and apply our methodology in various computing environments with- out the need for physical HW probes. If physical HW probes or an accurate high-level simulator were available for such, our methodology could be applied directly to those physi- cal energy measurements. The general procedure incremen- tally produces as much detail as can be isolated by micro- benchmark measurements and HW counters. The two key parameters in our model are static power and dynamic en- ergy consumption. We give lumped static power estimates for core and uncore, as well as dynamic energy contribution down to low-level nodes. 15