Full-System Simulation of big.LITTLE Multicore Architecture for Performance and Energy Exploration Anastasiia Butko, Florent Bruguier, Abdoulaye Gamati´ e, Gilles Sassatelli, David Novo, Lionel Torres and Michel Robert LIRMM (CNRS and University of Montpellier) Montpellier, France Email: {firstname.lastname}@lirmm.fr Abstract—Single-ISA heterogeneous multicore processors have gained increasing popularity with the introduction of recent technologies such as ARM big.LITTLE. These processors offer increased energy efficiency through combining low power in- order cores with high performance out-of-order cores. Efficiently exploiting this attractive feature requires careful management so as to meet the demands of targeted applications. In this paper, we explore the design of those architectures based on the ARM big.LITTLE technology by modeling performance and power in gem5 and McPAT frameworks. Our models are validated w.r.t. the Samsung Exynos 5 Octa (5422) chip. We show average errors of 20% in execution time, 13% for power consumption and 24% for energy-to-solution. Keywords—Full-system simulation, single-ISA heterogeneous, multicore, gem5, McPAT, performance, energy, accuracy, ARM big.LITTLE. I. I NTRODUCTION To meet rapidly growing demands, future computing sys- tems will need to be increasingly scalable and energy-efficient. To build architectures providing the required compromise in terms of performance and power dissipation, heterogeneous systems have become a promising direction. Such architectures usually consist of various processors/cores that differ from each other from their instruction set architectures (ISAs), their execution paradigms, e.g. in-order and out-of-order, their cache size and other fundamental characteristics. Particularly, single-ISA heterogeneous multicore processors [1] are made of multiple sets of cores that at the same time share a common ISA. Thereby they can run a unique standard operating system taking advantage of load-balancing features for fine control over performance and power consumption. There are three software execution modes, which aim to explore the provided heterogeneity: (i) cluster migration, (ii) core migration and (iii) heterogeneous multiprocessing (HMP) [2]. Among other modes that imply only partial use of available resources, HMP mode allows using all of the cores simultaneously and enables fine-grained control for task scheduling. In the mobile market, several system-on-chips (SoCs) oper- ating on that principle exist. Nvidia Tegra 3/4 SoC [3] repre- sents Variable Symmetric Multiprocessing (vSMP) technology that combines four faster power-hungry cores together with one ‘companion’ core dedicated to background tasks. All five cores have similar architecture, but the main cores are built in a standard silicon process to reach higher frequencies and the ‘companion’ core is built using a special low power silicon process that executes tasks at a low frequency [4]. ARM big.LITTLE technology integrated into Samsung Exynos 5/7 Octa SoC [5] combines two different types of cores. Developers reported over 50% in energy savings for popular activities such as web browsing and music playback with the duo Cortex-A7/Cortex-A15 configuration [6]. The design choice of architecture parameters such as the core types, the symmetric/asymmetric configurations, the cache size, is crucial for system energy efficiency. In [7] authors aim at providing some fundamental design insights based on a high-level analytical model analysis. Particularly, they claim two cores type being the most beneficial con- figuration and the task-to-core scheduling policy importance. Unlike analytical model-based estimation techniques [8] [9], full-system (FS) simulators provide a broad range of archi- tecture configurations for detailed design exploration. They enable realistic software execution including operating system, runtime scheduling and parallel workloads. Our contributions. In this work, we evaluate performance and power models of ARM big.LITTLE architecture for per- formance and energy trade-offs exploration. Models are imple- mented in gem5 [10] and McPAT [11] simulation frameworks. The accuracy in both performance and power estimations is assessed by comparing with a reference Exynos 5 Octa (5422) SoC integrated in the Odroid-XU3 computer board. This study is conducted using the Rodinia benchmark suite through its OpenMP implementation [12]. The main contributions of the present paper can be summa- rized as follows: • Cycle-approximate performance and power models of ARM big.LITTLE heterogeneous processor are defined and implemented. These models are validated w.r.t. the real Exynos 5 Octa (5422) system-on-chip and show average errors around 20% for performance, 13% for