Parallel Computing of Kernel Density Estimation with Different Multi-core
Programming Models
Panagiotis D. Michailidis
University of Western Macedonia
Florina, Greece
Email: pmichailidis@uowm.gr
Konstantinos G. Margaritis
University of Macedonia
Thessaloniki, Greece
Email: kmarg@uom.gr
Abstract—Kernel density estimation is nowadays very pop-
ular tool for nonparametric probabilistic density estimation.
One of its most important disadvantages is computational
complexity of computations needed, especially for large data
sets. One way for accelerating these computations is to use the
parallel computing with multi-core platforms. In this paper
we parallelize two kernel estimation methods such as the
univariate and multivariate kernel estimation from the field
of the computational econometrics on multi-core platform
using different programming frameworks such as Pthreads,
OpenMP, Intel Cilk++, Intel TBB, SWARM and FastFlow. The
purpose of this paper is to present an extensive quantitative
(i.e., performance) and qualitative (i.e., the ease of program-
ming effort) study of the multi-core programming frameworks
for these two kernel estimation methods.
Keywords-Kernel density estimation; Parallel computing;
Multi-core; parallel programming;
I. I NTRODUCTION
Nonparametric methods are becoming more commonplace
for applied data analysis, modeling and inference. One of
its main tools is kernel density estimation (KDE) and it has
been successfully applied to a large number of application
domains spanning a range of fields including computational
econometrics, market analysis and biostatistics to name but
a few. There are numerous publications and references about
the kernel density estimation methods mostly concerning
theoretical and practical aspects of the estimation methods;
see for example Silverman [25], Wand and Jones [27] and
Klemel [19].
Kernel density estimation methods are typical of compu-
tational order O(n
2
k) where n is the number of observations
and k the number of variables and in many cases the data
sets are becoming larger in recent years and these kernel
estimation methods are becoming more computer intensive
as econometricians estimate more complicated models and
utilize more sophisticated estimation techniques. Methods
of data-based bandwidth selection such as cross-validation
have also high computational requirements [15].
Few approximation techniques have been proposed for
reducing the huge computational requirements of kernel
density estimation methods. The first of them, proposed by
Silverman [26], uses Fast Fourier Transform (FFT). The
other one applies Fast Gauss Transform (FGT) as suggested
by Elgamall [16].
An alternative way to satisfy the computational demands
of kernel estimation methods is to use the parallel computing
with cluster of workstations and multi-core platforms. The
most important idea of parallel computing is to divide a
large-scale problem into a number of smaller problems that
can be solved concurrently on independent computers. There
are many references about parallel computing for related
non-parametric and econometric methods and applications;
see for example, Adams et al [5] and Greel and Coffe [13]
for a review and the monographs by Kontoghiorghes [20],
[21] treats parallel algorithms for statistics and linear econo-
metric models. However, in the field of the parallelization of
kernel density methods there are a few research works. For
example, Racine [23] presented a parallel implementation
of kernel density estimation on a cluster of workstations
using MPI library. Further, Creel [12] implemented the
kernel regression method in parallel on a cluster of work-
stations using MPI toolbox (MPITB) for GNU Octave [17].
Recently, Lukasik [22] presented three parallelizations for
kernel estimation, bandwidth selection and adaptation on a
cluster of computers using MPI programming model. The
parallelization of kernel estimation methods of previous
papers are based on the data partitioning technique where
each computer executes the same operations on different
portions of a large data set.
Based on research background, there isn’t an extensive
research work in the field of the parallelization of kernel es-
timation methods on multi-core platforms. For programming
multi-core processors there are many representative parallel
programming frameworks to simplify the parallelization
of the computationally-intensive applications. These frame-
works are Pthreads [11], OpenMP [4], Intel Cilk++ [1],
Intel TBB [2], SWARM [9] and FastFlow [6], [7]. These
frameworks based on a small set of extensions to the C
programming language and involve a relatively simple com-
pilation phase and potentially much more complex runtime
system. We must note that there is a little related work on
comparing different parallel programming frameworks on
multi-core platform for several applications. For example, a
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
1066-6192/12 $26.00 © 2012 IEEE
DOI 10.1109/PDP.2013.20
77
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
1066-6192/12 $26.00 © 2012 IEEE
DOI 10.1109/PDP.2013.20
77