9
An Autotuning Protocol to Rapidly Build Autotuners
JUNHONG LIU, GUANGMING TAN, and YULONG LUO, State Key Laboratory of Computer
Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese
Academy of Sciences
JIAJIA LI, Computational Science and Engineering, Georgia Institute of Technology
ZEYAO MO, Institute of Applied Physics and Computational Mathematics
NINGHUI SUN, State Key Laboratory of Computer Architecture, Institute of Computing Technology,
Chinese Academy of Sciences, University of Chinese Academy of Sciences
Automatic performance tuning (Autotuning) is an increasingly critical tuning technique for the high portable
performance of Exascale applications. However, constructing an autotuner from scratch remains a chal-
lenge, even for domain experts. In this work, we propose a performance tuning and knowledge management
suite (PAK) to help rapidly build autotuners. In order to accommodate existing autotuning techniques, we
present an autotuning protocol that is composed of an extractor, producer, optimizer, evaluator, and learner.
To achieve modularity and reusability, we also define programming interfaces for each protocol component
as the fundamental infrastructure, which provides a customizable mechanism to deploy knowledge mining
in the performance database. PAK’s usability is demonstrated by studying two important computational ker-
nels: stencil computation and sparse matrix-vector multiplication (SpMV). Our proposed autotuner based on
PAK shows comparable performance and higher productivity than traditional autotuners by writing just a
few tens of code using our autotuning protocol.
CCS Concepts: • Computing methodologies → Parallel programming languages;• Software and its en-
gineering → Application specific development environments;
Additional Key Words and Phrases: Autotuner, knowledge database, protocol, stencil, SpMV
ACM Reference format:
Junhong Liu, Guangming Tan, Yulong Luo, Jiajia Li, Zeyao Mo, and Ninghui Sun. 2019. An Autotuning Pro-
tocol to Rapidly Build Autotuners. ACM Trans. Parallel Comput. 5, 2, Article 9 (January 2019), 25 pages.
https://doi.org/10.1145/3291527
1 INTRODUCTION
The multi/many-core technique results in more complexity and diversity of architectures and pro-
gramming models, increasing the difficulty to develop high performance programs with reasonable
efficiency. Traditionally, tuning code by hand is trick-intensive and requires programmers to be
This work is supported by the National Key Research and Development Program of China (2016YFB0201305,
2016YFB0200504, 2017YFB0202105, 2016YFB0200803, 2016YFB0200300) and National Natural Science Foundation of China,
under Grant nos. 61521092, 91430218, 31327901, 61472395, and 61432018.
Authors’ addresses: J. Liu, G. Tan, Y. Luo, J. Li, and N. Sun, Institute of Computing Technology, Chinese Academy of Sci-
ences; Z. Mo, Institute of Applied Physics and Computational Mathematics; emails: liujunhong@ncic.ac.cn, tgm@ict.ac.cn,
luoyulong@ncic.ac.cn, jiajiali@gatech.edu, zeyao_mo@iapcm.ac.cn, snh@ict.ac.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from Permissions@acm.org.
© 2019 Association for Computing Machinery.
2329-4949/2019/01-ART9 $15.00
https://doi.org/10.1145/3291527
ACM Transactions on Parallel Computing, Vol. 5, No. 2, Article 9. Publication date: January 2019.