AVID: GPU-enabled Visual Analytics with GPU-FAST-PROCLUS Jakob Rùdsgaard Jùrgensen jakobrj@cs.au.dk Department of Computer Science Aarhus University Denmark Ira Assent ira@cs.au.dk Department of Computer Science DIGIT Centre for Digitalisation, Big Data and Data Analytics Aarhus University Denmark Hans-Jörg Schulz hjschulz@cs.au.dk Department of Computer Science Aarhus University Denmark ABSTRACT GPU-FAST-PROCLUS is a GPU-parallelized algorithm for pro- jected clustering based on the -medoids approach. It speeds up clustering to allow for real-time interaction ś even for datasets of millions of items. Interactivity allows users to quickly determine sensible clustering parameters such as the number of clusters , provided a suitable visualization is available. Yet, as clustering and visualization are usually decoupled, cluster results are fun- neled from the GPU back to the CPU, only to be mapped onto appropriate graphics, which are then rendered on the GPU again. This introduces a bottleneck that hinders fuid interaction with clustering. As a solution to this, we propose AVID (Analysis and Visu- alization In Device). Following the principle łWhat happens on the GPU, stays on the GPUž, AVID removes the round trip to the CPU and keeps clustering results on the GPU to render them on the GPU directly. By doing so, users can interactively tune projected clustering parameters and observe the efects without noticeable delay. In our demo system, we showcase the efciency of our data management strategies for projected clustering as well as the efcacy of data visualization. 1 INTRODUCTION Projected clustering aims to identify groups of similar objects in subspace projections of the full-dimensional space. Efcient algorithms for projected clustering are crucial as the number of possible subspace projections is exponential in the number of dimensions. Projected clustering algorithms must be provided with predefned parameters, but the best parameters are rarely known in advance. The choice of sensible parameters generally requires a human in the loop [4]. To enable interactive, human-in-the-loop parametrization of clustering, the efects of a change in parameters must be ob- servable at interactive framerates. This usually means that re- sults must be computed in around 100  to reduce the temporal separation [13, p.140] between parameter change and visualiza- tion change,and thus providing the necessary łfuidityž [3]. In Jùrgensen et al. [6], we present GPU-FAST-PROCLUS, a GPU- parallelized algorithm that computes projected clusters under the defnition of the well-known PROCLUS approach [2], which extends -medoids clustering to subspace projections. GPU-FAST- PROCLUS runs on a million points in around 100  , and there- fore theoretically allows for real-time interaction [11]. Yet, in order to visualize the results of GPU-FAST-PROCLUS to allow their interactive exploration under diferent parameterizations and in diferent projections ś similar to the works by Tatu et © 2022 Copyright held by the owner/author(s). Published in Proceedings of the 25th International Conference on Extending Database Technology (EDBT), 29th March-1st April, 2022, ISBN 978-3-89318-085-7 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. al. [12] or Yuan et al. [15] ś we would need to visualize these millions of points. To do so, the data would be clustered on the GPU (Graphics Processing Unit), then be transferred back to the CPU and mapped onto graphics primitives using some graphics framework, only to be then rendered again on the GPU. To prevent the bottleneck of the CPU, we propose to compute both the cluster analysis and the visualization as a combined pipeline directly on the GPU. While GPU-based visualization is widely used [5, 10, 14], GPU-based Visual Analytics combin- ing computational analysis and visualization on the GPU is still very rare with only a handful of systems having been published ś e.g., [1, 7, 9]. To the best of our knowledge, no such purely GPU-based solution exists for computing and visualizing pro- jected clusterings. Hence, we propose and demonstrate AVID (Analysis and Visualisation In Device), a real-time interactive data visualization for GPU-FAST-PROCLUS. 2 PROCLUS AND GPU-FAST-PROCLUS PROCLUS [2] is an axis-parallel projected clustering algorithm, inspired by the -medoids algorithm CLARANS [8]. Given a dataset and the parameters number of clusters , average number of dimensions , and scalars and . PROCLUS returns a cluster assignment for each point in some axis-aligned subspace projection for the respective cluster. To that end, PROCLUS proceeds in three phases: (1) Greedily picking potential medoids . (2) Iteratively improving the best set of current medoids that yields the best projected clustering (3) Further refning the best clustering. The fnal result are projected clusters within on average - dimensional subspace. E.g., if we have = 3 and = 4, clusters could exist within subspaces of 2, 3, or 7 dimensions. Our GPU-FAST-PROCLUS approach [6] provides efcient GPU- parallelization of PROCLUS clustering and even supports reusing computations between parameter settings, which is important in practice when determining the best set of parameters for a dataset and analysis task at hand. In Jùrgensen et al. [6], we also provide an experimental evaluation on both real-world and synthetic datasets, and with varying size, dimensionality, distribution, and parameter settings. In the following, we provide a brief overview, with more details given in [6]. Speed-up is achieved by maintaining the distances  from all points to all previously used medoids. Furthermore, the com- putation of scores , , which indicate the suitability of medoid in dimension , is reorganized. The most expensive part of computing , is the sum of distances , from each medoid to all points that are within that medoid’s sphere of infuence along each dimension . The sphere of infuence is all points Demonstration Paper Series ISSN: 2367-2005 562 10.48786/edbt.2022.51