Adaptive Physical Design for Curated Archives Tanu Malik 1 , Xiaodan Wang 2 , Debabrata Dash 3 , Amitabh Chaudhary 4 , Anastasia Ailamaki 5 , Randal Burns 2 1 Purdue University, USA tmalik@purdue.edu 2 Johns Hopkins University, USA {xwang,randal}@cs.jhu.edu 3 Carnegie Mellon University, USA ddash@cs.cmu.edu 4 University of Notre Dame, USA achaudha@cse.nd.edu 5 Swiss Federal Institutes of Technology, Switzerland anastasia.ailamaki@epfl.ch Abstract. We introduce AdaptPD, an automated physical design tool that im- proves database performance by continuously monitoring changes in the work- load and adapting the physical design to suit the incoming workload. Current physical design tools are offline and require specification of a representative workload. AdaptPD is “always on” and incorporates online algorithms which profile the incoming workload to calculate the relative benefit of transitioning to an alternative design. Efficient query and transition cost estimation modules allow AdaptPD to quickly decide between various design configurations. We evaluate AdaptPD with the SkyServer Astronomy database using queries sub- mitted by SkyServer’s users. Experiments show that AdaptPD adapts to changes in the workload, improves query performance substantially over offline tools, and introduces minor computational overhead. 1 Introduction Automated physical design tools are vital for large-scale databases to ensure optimal performance. Major database vendors such as Microsoft, IBM, and Oracle now include tuning and design advisers as part of their commercial offerings. The goal is to reduce a DBMS’ total cost of ownership by automating physical design tuning and provid- ing DBAs with useful recommendations about the physical design of their databases. However, current tools [1–3] provide limited automation; they take an offline approach to physical design and leave several significant decisions during the tuning process to DBAs. Specifically, DBAs need to explicitly specify representative workloads for the tuning tool. DBAs are also required to know when a tuning session is needed and guesstimate the relative benefit of implementing the recommendations. Complete automation is a critical requirement of libraries which will soon become data centers for curation of large scientific data. The Sloan Digital Sky Survey (SDSS) [4] project is a notable example in which its data will soon be curated by a library. The project receives a diverse workload, which exceeds a million queries every month. As such, finding a representative workload is challenging because query access patterns