AVID: GPU-enabled Visual Analytics with GPU-FAST-PROCLUS
Jakob Rùdsgaard Jùrgensen
jakobrj@cs.au.dk
Department of Computer Science
Aarhus University
Denmark
Ira Assent
ira@cs.au.dk
Department of Computer Science
DIGIT Centre for Digitalisation, Big
Data and Data Analytics
Aarhus University
Denmark
Hans-Jörg Schulz
hjschulz@cs.au.dk
Department of Computer Science
Aarhus University
Denmark
ABSTRACT
GPU-FAST-PROCLUS is a GPU-parallelized algorithm for pro-
jected clustering based on the -medoids approach. It speeds up
clustering to allow for real-time interaction ś even for datasets of
millions of items. Interactivity allows users to quickly determine
sensible clustering parameters such as the number of clusters ,
provided a suitable visualization is available. Yet, as clustering
and visualization are usually decoupled, cluster results are fun-
neled from the GPU back to the CPU, only to be mapped onto
appropriate graphics, which are then rendered on the GPU again.
This introduces a bottleneck that hinders fuid interaction with
clustering.
As a solution to this, we propose AVID (Analysis and Visu-
alization In Device). Following the principle łWhat happens on
the GPU, stays on the GPUž, AVID removes the round trip to the
CPU and keeps clustering results on the GPU to render them
on the GPU directly. By doing so, users can interactively tune
projected clustering parameters and observe the efects without
noticeable delay. In our demo system, we showcase the efciency
of our data management strategies for projected clustering as
well as the efcacy of data visualization.
1 INTRODUCTION
Projected clustering aims to identify groups of similar objects
in subspace projections of the full-dimensional space. Efcient
algorithms for projected clustering are crucial as the number of
possible subspace projections is exponential in the number of
dimensions. Projected clustering algorithms must be provided
with predefned parameters, but the best parameters are rarely
known in advance. The choice of sensible parameters generally
requires a human in the loop [4].
To enable interactive, human-in-the-loop parametrization of
clustering, the efects of a change in parameters must be ob-
servable at interactive framerates. This usually means that re-
sults must be computed in around 100 to reduce the temporal
separation [13, p.140] between parameter change and visualiza-
tion change,and thus providing the necessary łfuidityž [3]. In
Jùrgensen et al. [6], we present GPU-FAST-PROCLUS, a GPU-
parallelized algorithm that computes projected clusters under
the defnition of the well-known PROCLUS approach [2], which
extends -medoids clustering to subspace projections. GPU-FAST-
PROCLUS runs on a million points in around 100 , and there-
fore theoretically allows for real-time interaction [11]. Yet, in
order to visualize the results of GPU-FAST-PROCLUS to allow
their interactive exploration under diferent parameterizations
and in diferent projections ś similar to the works by Tatu et
© 2022 Copyright held by the owner/author(s). Published in Proceedings of the
25th International Conference on Extending Database Technology (EDBT), 29th
March-1st April, 2022, ISBN 978-3-89318-085-7 on OpenProceedings.org.
Distribution of this paper is permitted under the terms of the Creative Commons
license CC-by-nc-nd 4.0.
al. [12] or Yuan et al. [15] ś we would need to visualize these
millions of points. To do so, the data would be clustered on the
GPU (Graphics Processing Unit), then be transferred back to the
CPU and mapped onto graphics primitives using some graphics
framework, only to be then rendered again on the GPU.
To prevent the bottleneck of the CPU, we propose to compute
both the cluster analysis and the visualization as a combined
pipeline directly on the GPU. While GPU-based visualization
is widely used [5, 10, 14], GPU-based Visual Analytics combin-
ing computational analysis and visualization on the GPU is still
very rare with only a handful of systems having been published
ś e.g., [1, 7, 9]. To the best of our knowledge, no such purely
GPU-based solution exists for computing and visualizing pro-
jected clusterings. Hence, we propose and demonstrate AVID
(Analysis and Visualisation In Device), a real-time interactive
data visualization for GPU-FAST-PROCLUS.
2 PROCLUS AND GPU-FAST-PROCLUS
PROCLUS [2] is an axis-parallel projected clustering algorithm,
inspired by the -medoids algorithm CLARANS [8]. Given a
dataset and the parameters
• number of clusters ,
• average number of dimensions , and
• scalars and .
PROCLUS returns a cluster assignment for each point in some
axis-aligned subspace projection for the respective cluster. To
that end, PROCLUS proceeds in three phases:
(1) Greedily picking potential medoids ⊂ .
(2) Iteratively improving the best set of current medoids ⊂
that yields the best projected clustering
(3) Further refning the best clustering.
The fnal result are projected clusters within on average -
dimensional subspace. E.g., if we have = 3 and = 4, clusters
could exist within subspaces of 2, 3, or 7 dimensions.
Our GPU-FAST-PROCLUS approach [6] provides efcient GPU-
parallelization of PROCLUS clustering and even supports reusing
computations between parameter settings, which is important in
practice when determining the best set of parameters for a dataset
and analysis task at hand. In Jùrgensen et al. [6], we also provide
an experimental evaluation on both real-world and synthetic
datasets, and with varying size, dimensionality, distribution, and
parameter settings. In the following, we provide a brief overview,
with more details given in [6].
Speed-up is achieved by maintaining the distances from
all points to all previously used medoids. Furthermore, the com-
putation of scores
,
, which indicate the suitability of medoid
in dimension , is reorganized. The most expensive part of
computing
,
is the sum of distances
,
from each medoid
to all points that are within that medoid’s sphere of infuence
along each dimension . The sphere of infuence
is all points
Demonstration Paper
Series ISSN: 2367-2005 562 10.48786/edbt.2022.51