Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases Zhihao Zheng 1 , Yuanzhe Bei 2 , Hongyan Sun 3 , and Pengyu Hong 1(B ) 1 Brandeis University, Waltham, MA 02453, USA {zhihaozh,hongpeng}@brandeis.edu 2 MicroFocus Vertica, Cambridge, MA 02140, USA yuanzhe.bei@microfocus.com 3 YinTech Innovation Labs, Waltham, MA 02451, USA hongyan.sun@yintechlabs.com Abstract. Reliable query execution time prediction is a desirable fea- ture for modern databases because it can greatly help ease the database administration work and is the foundation of various database manage- ment/automation tools. Most exiting studies on modeling query execu- tion time assume that each individual query is executed as serialized steps. However, with the increasing data volume and the demand for low query latency, large-scale databases have been adopting the mas- sive parallel processing (MPP) architecture. In this paper, we present a novel machine learning based approach for building a robust model to estimate query execution time by considering both query-based statis- tics and real-time system attributes. The experiment results demonstrate our approach is able to reliably predict query execution time in both idle and noisy environments at random levels of concurrency. In addition, we found that both query and system factors are crucial in making stable predictions. Keywords: Query execution · Machine learning · Concurrent 1 Introduction Commercial databases hold companies’ most critical information and need to be maintained at high-availability with stable-latency at all times. They need to be installed and tuned very carefully (e.g., fault tolerance, knob setting, resource pool setting, etc.). Nevertheless, no matter how comprehensive a database has been tuned, it is still challenging to maintain stable-latency [2]. In real world scenarios, databases receive various queries with a wide range of complexities at any given time. Some of those queries are sub-optimal and even do not make much sense, and may cause a database to execute with unexpected long latency and fail to guarantee service quality [4]. In those cases, database administrators c Springer Nature Switzerland AG 2019 F. Wotawa et al. (Eds.): IEA/AIE 2019, LNAI 11606, pp. 63–70, 2019. https://doi.org/10.1007/978-3-030-22999-3_6