Harnessing ISA Diversity: Design of a Heterogeneous-ISA Chip Multiprocessor Ashish Venkat Dean M. Tullsen University of California, San Diego {asvenkat,tullsen}@cs.ucsd.edu Abstract Heterogeneous multicore architectures have the potential for high performance and energy efficiency. These architec- tures may be composed of small power-efficient cores, large high-performance cores, and/or specialized cores that accel- erate the performance of a particular class of computation. Architects have explored multiple dimensions of heterogeneity, both in terms of micro-architecture and specialization. While early work constrained the cores to share a single ISA, this work shows that allowing heterogeneous ISAs further extends the effectiveness of such architectures. This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha. This architecture has the potential to outperform the best single-ISA heterogeneous architecture by as much as 21%, with 23% energy savings and a reduction of 32% in Energy Delay Product. 1. Introduction Architects have proposed heterogeneous chip multiprocessors for both general-purpose computing and embedded applica- tions. These architectures exploit heterogeneity in two fun- damental dimensions. While some architectures make use of specialized hardware to accelerate the performance of certain workloads [1, 2, 3, 19], others employ a different set of mi- croarchitectural parameters [4, 15, 16, 22, 23, 24] in order to create energy-efficient processors for mixed workloads. The latter constrain the cores to execute a single instruction set architecture (ISA), maximizing efficiency by allowing a thread to dynamically identify, and migrate to, the core to which it is most suited during a particular phase and under the current environmental constraints. This paper demonstrates that not only is that constraint unnecessary, but limiting an architecture to a single ISA restricts the potential heterogeneity, sacrificing performance and efficiency gains. A critical step in the design of a heterogeneous-ISA archi- tecture is choosing a diverse set of ISAs. While ISAs seem to converge over time (RISC ISAs adding complex operations, CISC ISAs translated to RISC μ ops internally), there remains sufficient diversity in existing modern ISAs to provide useful heterogeneity. We examine some key aspects that character- ize ISA diversity. These include code density, decode and instruction complexity, register pressure, native floating-point arithmetic vs emulation, and SIMD processing. In this paper, we harness the diversity offered by three ISAs: ARM’s Thumb [5], x86-64 [17], and Alpha [12]. By co- designing the hardware architectures and the ISAs to provide the best aggregate architecture, we arrive at a more effective and efficient design than one composed of homogeneous cores, or even heterogeneous cores that share a single ISA. The design of a heterogeneous-ISA chip multiprocessor involves navigating a complex search space, made larger by the additional dimension of freedom. A major contribution of this work is such a design space exploration geared at find- ing an optimal heterogeneous-ISA CMP for general-purpose mixed workloads. Observing the results of the design space exploration, we provide architects with a set of tools to enable ISA-microarchitecture co-design and thereby better streamline their search processes. To reap the full benefits of the heterogeneity, especially the heterogeneity available in the form of ISA diversity, it is important that an application is able to migrate freely between the cores. However, migration in a heterogeneous-ISA envi- ronment is a well known difficult problem [14, 31, 37]. This is because the runtime state of a program is kept in ISA-specific form, and migration to a different ISA involves expensive pro- gram state transformation. DeVuyst, et al. [11] demonstrate that migration between ISAs can be achieved at acceptable cost on a CMP; however, that work does not explore the archi- tectural advantages to multiple ISAs on a single CMP. This research employs several ideas from that work, but also several new optimizations to reduce the overhead of migration. In this paper, we present a detailed compilation methodology and an effective runtime strategy that works for a diverse set of ISAs. We observe that even a single application can gain up to 11.2% performance benefit by migrating between heterogeneous-ISA cores during different phases of its execution. Finally, we evaluate the proposed heterogeneous-ISA CMP against both homogeneous and single-ISA heterogeneous CMPs, under varying power and area budgets. Consequently, we make the following major observations: • Co-design of ISA and microarchitectural parameters is criti- cal. In the optimal designs, cores employing different ISAs tend to naturally diverge, and to diverge in consistent direc- tions. • ISA heterogeneity is not only beneficial across applications, but also within individual applications across phases. We find that heterogeneous-ISA CMPs can improve single- thread performance by an average of 20.8% and provide 15.8% more throughput on multi-programmed mixed workloads, as compared to a single-ISA heterogeneous CMP. Additionally, heterogeneous-ISA CMPs can help achieve an average reduc- tion of 29.8% in Energy Delay Product. The rest of this paper is organized as follows. Section 2 describes related work. Section 3 evaluates the diversity of- fered by the ISAs chosen for this work. Section 4 lays out our design methodology. We present our compilation and runtime methodologies in Section 5. Section 6 describes our experimental methodology. Section 7 evaluates the proposed 978-1-4799-4394-4/14/$31.00 c 2014 IEEE 121