Multi-Objective Hardware-Software Co-Optimization for the SNIPER Multi-Core Simulator Radu CHIS 1 1 Computer Science Department Technical University Cluj-Napoca, Romania radu.chis@gmail.com Lucian VINTAN 1,2 2 Computer Science & Electrical Engineering Department “Lucian Blaga” University Sibiu, Romania lucian.vintan@ulbsibiu.ro Abstract: Modern complex microarchitectures with multicore systems like CPUs, APUs (accelerated processing units) and GPUs require hundreds or thousands of hardware parameters to be fine-tuned to get the best results regarding different objectives like: performance, hardware complexity (integration area), power consumption, temperature, etc. These are only a few of the objectives needed to be taken into consideration when designing a new multicore system. Exploring this huge design space requires special tools like automatic design space exploration frameworks to optimize the hardware parameters. Although the microarchitecture might be very complex, the performance of the applications is also highly dependent on the degree of software optimization. This adds a new challenge to the DSE process. In this paper, using the multi-objective design space exploration tool FADSE, we tried to optimize the hardware and software parameters of the multicore SNIPER simulator running SPLASH-2 benchmarks suite. We optimized the hardware parameters (nr cores, cache sizes, cache associativity, etc.) and software parameters (GCC optimizations, threads, and scheduler) values that have been varied during the DSE process and shown the impact of these parameters on the optimization’s multi-objectives (performance, area and power consumption). Furthermore, for the best found Pareto configurations the temperatures will be computed so that in the end we will have a 4-dimensional objective space. Keywords—Design Space Exploration, Multi-objective Optimization Algorithms, Sniper Multi-Core Simulator, SPLASH-2 benchmarks I. INTRODUCTION Modern systems require more complex and novel processor architectures. Current desktop computers CPUs already have 4- 8 complex cores. Accelerated processing units (APUs) combine a CPU with an integrated GPU to deliver performance comparable to dedicated GPUs but at a fraction of the power consumption, while modern phones and tablets have CPUs with 4-8 cores, too. These new architectures bring multiple new research challenges. For example, the hardware parameters of the micro-architecture have to be optimized in parallel with the target applications (hardware software co- optimization). The huge search space created by the processor/compiler/application configurations is not feasible to be entirely evaluated. With the increase of the number of cores on a chip, not only the performance increases, but also the power consumption and temperature. Due to the fact that the integrated transistors’ dimensions decreased from year to year, the integration area of the CPUs didn’t really increase. The three objectives - Performance (Instructions per Cycle, IPC), Power Consumption and Area - come into conflict with each other and all of them have to be either maximized or minimized. Optimizing hardware alone might not be sufficient and co- optimizing the workload and the hardware in a hardware- software co-optimization process can lead to much better results. Optimizing the system’s performance in a given power, area and energy budget is widely adopted for the design of smartphones, tablets and laptops/desktops. Evolutionary algorithms have been used to overcome the conflicting objectives. Different heuristics and meta-heuristics have been designed in Automatic Design Space Exploration Tools (ADSE) to find the Pareto front approximation in a feasible amount of time. One of these ADSE tools is FADSE [3] (Framework for Automatic Design Space Exploration) that has been developed by former Ph. D. student Horia Calborean under the supervision of Prof. Lucian Vintan at the “Lucian Blaga” University of Sibiu. Computation-intensive searches using state of the art evolutionary multi-objective algorithms, guided by the human experience are automatically performed by FADSE as presented in our previous works [1], [2]. In this article we present the automatic design space exploration of the Sniper [4] multicore simulator with FADSE for both hardware (number of cores, cache sizes, cache associativity) and software parameters (GCC optimizations, number of threads and scheduler) using the NSGA-II meta- heuristic from the jMetal library. We will show the impact of these parameters on the objectives we set: CPI (clocks per instruction), Area and Energy. At the end of the DSE process, we also compute the temperatures of the best found configurations, thus having computed a 4-dimensional objective space. This article is structured as follows: Section 2 provides an overview of the related work. The tools (FADSE and Sniper) are presented in Section 3, while Design Space Exploration and Optimization concepts along with the objectives used and the