HASS: A Scheduler for Heterogeneous Multicore Systems Daniel Shelepov 1 dsa5@cs.sfu.ca Juan Carlos Saez Alcaide 2 jcsaezal@fdi.ucm.es Stacey Jeffery 3 sjeffery@uwaterloo.ca Alexandra Fedorova 1 fedorova@cs.sfu.ca Nestor Perez 1 npa5@sfu.ca Zhi Feng Huang 1 zfh@sfu.ca Sergey Blagodurov 1 sba70@cs.sfu.ca Viren Kumar 1 vka4@cs.sfu.ca 1 Simon Fraser University 8888 University Drive Burnaby, BC, Canada 2 Complutense University of Madrid Ciudad Universitaria – 28040, Madrid, Spain 3 University of Waterloo 200 University Avenue West Waterloo, ON, Canada Abstract Future heterogeneous single-ISA multicore processors will have an edge in potential performance per watt over comparable homogeneous processors. To fully tap into that potential, the OS scheduler needs to be heterogeneity-aware, so it can match jobs to cores according to characteristics of both. We propose a Heterogeneity-Aware Signature-Supported scheduling algorithm that does the matching using per-thread architectural signatures, which are compact summaries of threads’ architectural properties collected offline. The resulting algorithm does not rely on dynamic profiling, and is comparatively simple and scalable. We implemented HASS in OpenSolaris, and achieved average workload speedups of up to 13%, matching best static assignment, achievable only by an oracle. We have also implemented a dynamic IPC-driven algorithm proposed earlier that relies on online profiling. We found that the complexity, load imbalance and associated performance degradation resulting from dynamic profiling are significant challenges to using this algorithm successfully. As a result it failed to deliver expected performance gains and to outperform HASS. Categories and Subject Descriptors D.4.1 [Operating Systems]: Process Management – scheduling. General Terms Algorithms, Management, Performance, Design. Keywords heterogeneous, multicore, scheduling, asymmetric, architectural signatures. 1. Introduction Single-ISA heterogeneous multicore processors, also known as asymmetric single-ISA (ASISA) [18], consist of cores exposing the same ISA, but delivering different performance. These cores differ in clock frequency, power consumption, and possibly in cache size and other microarchitectural features. Asymmetry may be built in by design [14][15], or may occur due to process variation [13] or explicit clock frequency scaling. Given a diverse workload, an ASISA system can deliver more performance per watt than a homogeneous system, because threads can be matched to cores according to the relative benefit that they derive from running on different core types. For example, in an ASISA system with several fast and powerful cores (high clock speed, ILP- oriented optimizations) and several simple and slow cores, memory-bound threads should typically be mapped to slow cores, because the speedup they experience on fast cores relative to slow cores is disproportionately smaller than the additional power they consume. Power and area efficiencies of ASISA systems have been demonstrated in numerous studies [3][14][15][16][18]. In addition, asymmetric systems allow superior performance for mixed workloads of sequential and parallel applications [10]. Efficiency of ASISA systems is maximized when workloads are matched with cores according to the properties of the workload and features of the core. This matching is typically done by a heterogeneity-aware scheduling algorithm in the operating system (het.-aware from now on for brevity). In this paper we describe a new het.-aware scheduling algorithm that employs an original methodology compared to the ones proposed in the past. Our algorithm, called Het.-Aware Signature-Supported (HASS) scheduler is based on the idea of architectural signatures. An architectural signature is a compact summary of architectural properties of an application. It may contain information about memory-boundedness, available ILP, sensitivity to variations in clock speed and other parameters. The common property of these parameters is that they can all be relatively easily and quickly interpreted by the scheduler to determine how well a given application “matches” a given core. The signatures are generated offline and are presented to the scheduler as a single unit with the application binary, perhaps by being embedded into the binary itself. The scheduler then matches jobs with cores based on these signatures. Unlike HASS, previously proposed het.-aware algorithms determined the best matching of threads to cores via online performance monitoring [3][15], which determined relative speedup of each thread on different core types. As the number of cores (and core types) on the chip increases [1][5], the overhead of performance monitoring grows and it becomes less practical as a means of determining optimal assignment. Our scheme does not use online monitoring and thus removes the overhead associated with it, in exchange sacrificing some accuracy. Static nature of HASS imposes some limitations on its structure and functionality, and so it is important to investigate their impact. First of all, in the current implementation there is only one signature per application. While this scheme can be extended to multithreaded applications with relative ease, it is more difficult to accommodate for different input sets, which can sometimes cause significant changes in application behaviour and therefore optimal thread-to-core mappings. We investigated the