9 An Autotuning Protocol to Rapidly Build Autotuners JUNHONG LIU, GUANGMING TAN, and YULONG LUO, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences JIAJIA LI, Computational Science and Engineering, Georgia Institute of Technology ZEYAO MO, Institute of Applied Physics and Computational Mathematics NINGHUI SUN, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences Automatic performance tuning (Autotuning) is an increasingly critical tuning technique for the high portable performance of Exascale applications. However, constructing an autotuner from scratch remains a chal- lenge, even for domain experts. In this work, we propose a performance tuning and knowledge management suite (PAK) to help rapidly build autotuners. In order to accommodate existing autotuning techniques, we present an autotuning protocol that is composed of an extractor, producer, optimizer, evaluator, and learner. To achieve modularity and reusability, we also define programming interfaces for each protocol component as the fundamental infrastructure, which provides a customizable mechanism to deploy knowledge mining in the performance database. PAK’s usability is demonstrated by studying two important computational ker- nels: stencil computation and sparse matrix-vector multiplication (SpMV). Our proposed autotuner based on PAK shows comparable performance and higher productivity than traditional autotuners by writing just a few tens of code using our autotuning protocol. CCS Concepts: • Computing methodologies Parallel programming languages;• Software and its en- gineering Application specific development environments; Additional Key Words and Phrases: Autotuner, knowledge database, protocol, stencil, SpMV ACM Reference format: Junhong Liu, Guangming Tan, Yulong Luo, Jiajia Li, Zeyao Mo, and Ninghui Sun. 2019. An Autotuning Pro- tocol to Rapidly Build Autotuners. ACM Trans. Parallel Comput. 5, 2, Article 9 (January 2019), 25 pages. https://doi.org/10.1145/3291527 1 INTRODUCTION The multi/many-core technique results in more complexity and diversity of architectures and pro- gramming models, increasing the difficulty to develop high performance programs with reasonable efficiency. Traditionally, tuning code by hand is trick-intensive and requires programmers to be This work is supported by the National Key Research and Development Program of China (2016YFB0201305, 2016YFB0200504, 2017YFB0202105, 2016YFB0200803, 2016YFB0200300) and National Natural Science Foundation of China, under Grant nos. 61521092, 91430218, 31327901, 61472395, and 61432018. Authors’ addresses: J. Liu, G. Tan, Y. Luo, J. Li, and N. Sun, Institute of Computing Technology, Chinese Academy of Sci- ences; Z. Mo, Institute of Applied Physics and Computational Mathematics; emails: liujunhong@ncic.ac.cn, tgm@ict.ac.cn, luoyulong@ncic.ac.cn, jiajiali@gatech.edu, zeyao_mo@iapcm.ac.cn, snh@ict.ac.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. © 2019 Association for Computing Machinery. 2329-4949/2019/01-ART9 $15.00 https://doi.org/10.1145/3291527 ACM Transactions on Parallel Computing, Vol. 5, No. 2, Article 9. Publication date: January 2019.