J Supercomput DOI 10.1007/s11227-017-2176-6 Hierarchical multicore thread mapping via estimation of remote communication Hamidreza Khaleghzadeh 1 · Hossein Deldari 2 · Ravi Reddy 1 · Alexey Lastovetsky 1 © Springer Science+Business Media, LLC 2017 Abstract Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote com- munications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Max- imum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained map- ping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in per- formance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping. B Hamidreza Khaleghzadeh hamidreza.khaleghzadeh@ucdconnect.ie Hossein Deldari hdeldari@salman.ac.ir Ravi Reddy ravi.manumachu@ucd.ie Alexey Lastovetsky alexey.lastovetsky@ucd.ie 1 School of Computer Science,University College Dublin, Belfield, Dublin 4, Ireland 2 Salman Institute of Higher Education, Mashhad, Iran 123