Research Article
Accelerating the HyperLogLog Cardinality Estimation Algorithm
Cem Bozkus
1
and Basilio B. Fraguela
2
1
Bilkent
¨
Universitesi, Ankara, Turkey
2
Universidade da Coru˜ na, A Coru˜ na, Spain
Correspondence should be addressed to Basilio B. Fraguela; basilio.fraguela@udc.es
Received 29 June 2017; Accepted 6 August 2017; Published 14 September 2017
Academic Editor: Piotr Luszczek
Copyright © 2017 Cem Bozkus and Basilio B. Fraguela. Tis is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In recent years, vast amounts of data of diferent kinds, from pictures and videos from our cameras to sofware logs from sensor
networks and Internet routers operating day and night, are being generated. Tis has led to new big data problems, which require
new algorithms to handle these large volumes of data and as a result are very computationally demanding because of the volumes
to process. In this paper, we parallelize one of these new algorithms, namely, the HyperLogLog algorithm, which estimates the
number of diferent items in a large data set with minimal memory usage, as it lowers the typical memory usage of this type of
calculation from () to (1). We have implemented parallelizations based on OpenMP and OpenCL and evaluated them in a
standard multicore system, an Intel Xeon Phi, and two GPUs from diferent vendors. Te results obtained in our experiments, in
which we reach a speedup of 88.6 with respect to an optimized sequential implementation, are very positive, particularly taking
into account the need to run this kind of algorithm on large amounts of data.
1. Introduction
Very ofen the processing of very large data sets does not
require accurate solutions, being enough to fnd approximate
ones that can be achieved much more efciently. Tis strategy,
called approximate computing, has been used in comput-
ing for many years and can be applied in those contexts
where answers that are close enough to the actual value are
acceptable, giving place to a trade-of of accuracy for other
resources, typically memory space and time. For example,
the scholastic gradient descent algorithm of machine learning
is used to calculate approximate local minimum and not
exact global minimum. Another example is bloom flters [1],
which allow easily checking whether an item is in a set or
not by using multiple hash functions, there being a certain
probability of false positives, that is, of classifying as members
of the set items that actually do not belong to it.
HyperLogLog (HLL) [2] is a very powerful approximate
algorithm in the sense that it can practically and efciently
give a good estimation of the cardinality of a data set, meaning
the number of diferent items in it with respect to some char-
acteristics. Tis value has many real life applications, making
its computation a must for high profle companies working
in big data. Tis algorithm is used by basically anyone who
has a data set and needs its cardinality without wasting space,
as it can calculate the cardinality of items with (1) space
complexity. Because of the way it is implemented, HLL allows
many smaller budget companies and individuals without
large memory banks to use large data sets.
In this paper we develop two parallel implementations of
the HyperLogLog algorithm, one of them based on OpenMP
and targeted to multicore processors and Intel Xeon Phi
accelerators and another one based on OpenCL, which can
be run not only on these systems but also on other kinds of
accelerators such as GPUs. Both implementations are com-
pared on an Intel Xeon Phi and a standard multicore system,
while the performance of the OpenCL version is evaluated in
these platforms as well as in an NVIDIA Tesla K20m GPU
and an AMD FirePro S9150 GPU.
Te remainder of this paper is organized as follows.
Section 2 briefy summarizes the HyperLogLog algorithm
and its sequential implementation, while Section 3 details our
parallel implementations. Tis is followed by an evaluation
in Section 4. Ten, Section 5 is devoted to the related work.
Finally, Section 6 is devoted to our concluding ideas.
Hindawi
Scientific Programming
Volume 2017, Article ID 2040865, 8 pages
https://doi.org/10.1155/2017/2040865