Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine’s Perspective Marcus Pinnecke, David Broneske, Gabriel Campero Durand 1 and Gunter Saake University of Magdeburg Email: {firstname.lastname}@ovgu.de & 1 campero@ovgu.de Abstract—Employing special-purpose processors (e.g., GPUs) in database systems has been studied throughout the last decade. Research on heterogeneous database systems that use both general- and special-purpose processors has addressed either transaction- or analytic processing, but not the combination of them. Support for hybrid transaction- and analytic processing (HTAP) has been studied exclusively for CPU-only systems. In this paper we ask the question whether current systems are ready for HTAP workload management with cooperating general- and special-purpose processors. For this, we take the perspective of the backbone of database systems: the storage engine. We propose a unified terminology and a comprehensive taxonomy to compare state-of-the-art engines from both domains. We show similarities and differences, and determine a necessary set of features for engines supporting HTAP workload on CPUs and GPUs. Answering our research question, our findings yield a resolute: not yet. I. I NTRODUCTION Two challenges are being set today for database systems: continuous physical record layout organization and continuous compute device assignment in the face of mixed workload types (cf. Figure 1). On the one hand, database systems need to combine simultaneous support for analytical and transactional processing [1], [2], [3], [4]. Merging both processing types into one single system promises a larger business value by mini- mizing analytic latency and data synchronization effort [5]. On the other hand, database systems must make an optimal use of a wide range of heterogeneous processors types, such as Graphics Processing Units (GPUs), Multiple Integrated Cores (MICs), or Field Programmable Gate Arrays (FPGAs). Build- ing on these heterogeneous compute platforms is necessary to overcome limitations such as the power wall [6]. The research on heterogeneous systems introduces design considerations into single-machine system architectures [7], [8], [9], [10], [11] that has similarities to distributed computing [12] and federated systems [13], [14]. These design considerations are driven by the following challenges: (a.i) expensive data transfer to and from the device memory, (a.ii) different memory types per compute platform, and (a.iii) strict limitations regarding the device memory capacity. Consequently, heterogeneous systems demand special locality-aware approaches able to support column-based placement of certain data stored in a relation [7], [10], and tailored strategies for data placement to avoid degen- eration of query performance by cache thrashing and other side-effects during query processing [15], [16]. Database sys- tems supporting Hybrid Transactional/Analytical Processing workloads (HTAP) [5] also demand special design considera- tions. HTAP database systems, such as HyPer [1], Peloton [2], ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated Fig. 1. Physical record layout re-organization and compute device re- assignment in database systems that manage HTAP workloads efficiently. and SAP HANA [17], address particular challenges implied by the hybridization of both analytical and transactional workload processing into one system. These challenges are: (b.i) different data access patterns implied by different workload types, (b.ii), continuous physical optimization in consideration of contradicting optimization goals, and (b.iii) efficient processing of both workload types without interferences between long- running ad-hoc analytic queries and massive short-living write- intensive transactional queries. Consequently, HTAP-workload systems demand special concepts for physical storage layout handling [18] including the capability to adapt to changes in the workload during runtime [2], [3], [19] and advanced techniques to detach analytic query execution from mission- critical transactional data [1], [20]. A storage engine is highly tailored to challenges that a database system faces and is fundamental for the entire system. In this paper we argue that currently proposed design decisions to face these challenges (a.i – iii & b.i – iii) might be complementary to each other, especially when considered from the perspective of a storage engine. We proceed with our paper as follows: We first provide background to the field of physical record organization including our experimental findings (Section II). We then contribute the following to bridge the gap between the design solutions from both fields: • A novel storage engine design taxonomy (Section III). • A survey and classification of state-of-the-art systems from both fields (Sections IV-A and IV-B). • An identification of characteristics for HTAP work- loads on CPU / GPU systems (Section IV-C). Author Copy of: Marcus Pinnecke, David Broneske, Gabriel Campero Durand and Gunter Saake. Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine’s Perspective. IEEE 33rd International Conference on Data Engineering (ICDE), 2017, pp 1599-1606, DOI 10.1109/ICDE.2017.237