Adaptive Multi-Model Reinforcement Learning for Online Database Tuning Yaniv Gur IBM Almaden Research Center San Jose, CA guryaniv@us.ibm.com Dongsheng Yang Princeton University Princeton, NJ dy5@princeton.edu Frederik Stalschus DHBW Stuttgart Stuttgart, Germany frederik.stalschus@ibm.com Berthold Reinwald IBM Almaden Research Center San Jose, CA reinwald@us.ibm.com ABSTRACT Mainstream DBMSs provide hundreds of knobs for performance tuning. Tuning those knobs requires experienced database ad- ministrators (DBA), who are often unavailable for owners of small-scale databases, a common scenario in the era of cloud computing. Therefore, algorithms that can automatically tune the database performance with minimum human guidance is of increasing importance. Developing an automatic database tuner poses a number of challenges that need to be addressed. First, out-of-the-box machine learning solutions cannot be directly applied to this problem and, therefore, need to be modified to perform well on this specific problem. Second, training samples are scarce due to the time it takes to collect each data point and the limited accessibility to query data submitted by the database users. Third, databases are complicated systems with unstable performance, which leads to noisy training data. Furthermore, in a realistic online environment, workloads can change when users run different applications at different times. Although there are several research projects for automatic database tuning, they have not fully addressed this challenge, and they are mainly de- signed for offline training where the workloads do not change. In this paper, we aim to tackle the challenge of online tuning in evolving workloads environment by proposing a multi-model tuning algorithm that leverages multiple Deep Deterministic Pol- icy Gradient (DDPG) reinforcement learning models trained on varying workloads. To evaluate our approach, we have imple- mented a system for tuning a PostgreSQL database. The results show that we can automatically tune a PostgreSQL database and improve its performance on OLTP workloads and can adapt to changing workloads using our multi-model approach. 1 INTRODUCTION Modern DBMSs have hundreds of configuration knobs that affect their performance. A DBMS that is not configured properly for the current workload may lead to sub-optimal performance and inefficient usage of system resources that may result in hundreds of users that are not getting the performance they need for their applications. The role of monitoring and configuring a DBMS was traditionally done by a database administrator (DBA), an expert dedicated to this task. However, nowadays, multiple DMBS instances are deployed on the cloud and each instance could host hundreds of databases, therefore, the task of monitoring and © 2021 Copyright held by the owner/author(s). Published in Proceedings of the 24th International Conference on Extending Database Technology (EDBT), March 23-26, 2021, ISBN 978-3-89318-084-4 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. configuring a large-scale database infrastructure requires a large number of DBAs, which would lead to high operation costs. Over the last few years, several database vendors have identi- fied the potential of using machine learning to automate different database tasks on the cloud, such as automatic indexing, configu- ration, and provisioning. A few examples include the autonomous database from Oracle [11] and the self-driving database from Alibaba [1]. The study of autonomous databases using AI is a very active research area that already yielded a large number of papers, where the most popular machine-learning paradigm in recent works is reinforcement-learning [7, 9, 14, 18]. Born as a machine-learning branch for solving complex control problems, reinforcement learning is a natural choice the automatic database tuning tasks. One of the main challenges of operating an automatic DBMS tuning system on the cloud is the fact that the database environ- ment is dynamic: system resources, workloads, and database size could change in the course of the day, therefore, a system for au- tomatic tuning needs to be flexible and adapt to these changes to provide the optimal performance for a given environment state. In this paper, we address the problem of changing workloads in an online tuning setting, and we employ reinforcement learning for this task. While query-aware formulations for tuning were previously proposed [7, 16], the problem of changing workloads in an online tuning setting was not fully addressed. Our main contributions in this paper are as follows: We propose a multi-model online tuning algorithm, sensi- tive to workload changes, that leverages multiple DDPG reinforcement learning models and selects the optimal model for evolving workloads. We propose a simple reward function formulation for of- fline and online tuning and show that it yields a more stable learning curve compared to previous art [18]. We demonstrate the offline and online tuning algorithms on a PostgreSQL database and show that the performance of the database can be significantly improved over the baseline default performance. 2 RELATED WORK In recent years, multiple studies have addressed the problem of automatic DBMS tuning using various machine-learning tech- niques. In [5] a method called adaptive sampling was used to automate the knob configuration selection by sampling from past experience and in OtterTune [16] Gaussian Process (GP) regression was used to recommend the best knob settings. Rein- forcement learning over continuous action configuration space Short Paper Series ISSN: 2367-2005 439 10.5441/002/edbt.2021.48