Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models Panagiotis D. Michailidis University of Western Macedonia Florina, Greece Email: pmichailidis@uowm.gr Konstantinos G. Margaritis University of Macedonia Thessaloniki, Greece Email: kmarg@uom.gr Abstract—Kernel density estimation is nowadays very pop- ular tool for nonparametric probabilistic density estimation. One of its most important disadvantages is computational complexity of computations needed, especially for large data sets. One way for accelerating these computations is to use the parallel computing with multi-core platforms. In this paper we parallelize two kernel estimation methods such as the univariate and multivariate kernel estimation from the field of the computational econometrics on multi-core platform using different programming frameworks such as Pthreads, OpenMP, Intel Cilk++, Intel TBB, SWARM and FastFlow. The purpose of this paper is to present an extensive quantitative (i.e., performance) and qualitative (i.e., the ease of program- ming effort) study of the multi-core programming frameworks for these two kernel estimation methods. Keywords-Kernel density estimation; Parallel computing; Multi-core; parallel programming; I. I NTRODUCTION Nonparametric methods are becoming more commonplace for applied data analysis, modeling and inference. One of its main tools is kernel density estimation (KDE) and it has been successfully applied to a large number of application domains spanning a range of fields including computational econometrics, market analysis and biostatistics to name but a few. There are numerous publications and references about the kernel density estimation methods mostly concerning theoretical and practical aspects of the estimation methods; see for example Silverman [25], Wand and Jones [27] and Klemel [19]. Kernel density estimation methods are typical of compu- tational order O(n 2 k) where n is the number of observations and k the number of variables and in many cases the data sets are becoming larger in recent years and these kernel estimation methods are becoming more computer intensive as econometricians estimate more complicated models and utilize more sophisticated estimation techniques. Methods of data-based bandwidth selection such as cross-validation have also high computational requirements [15]. Few approximation techniques have been proposed for reducing the huge computational requirements of kernel density estimation methods. The first of them, proposed by Silverman [26], uses Fast Fourier Transform (FFT). The other one applies Fast Gauss Transform (FGT) as suggested by Elgamall [16]. An alternative way to satisfy the computational demands of kernel estimation methods is to use the parallel computing with cluster of workstations and multi-core platforms. The most important idea of parallel computing is to divide a large-scale problem into a number of smaller problems that can be solved concurrently on independent computers. There are many references about parallel computing for related non-parametric and econometric methods and applications; see for example, Adams et al [5] and Greel and Coffe [13] for a review and the monographs by Kontoghiorghes [20], [21] treats parallel algorithms for statistics and linear econo- metric models. However, in the field of the parallelization of kernel density methods there are a few research works. For example, Racine [23] presented a parallel implementation of kernel density estimation on a cluster of workstations using MPI library. Further, Creel [12] implemented the kernel regression method in parallel on a cluster of work- stations using MPI toolbox (MPITB) for GNU Octave [17]. Recently, Lukasik [22] presented three parallelizations for kernel estimation, bandwidth selection and adaptation on a cluster of computers using MPI programming model. The parallelization of kernel estimation methods of previous papers are based on the data partitioning technique where each computer executes the same operations on different portions of a large data set. Based on research background, there isn’t an extensive research work in the field of the parallelization of kernel es- timation methods on multi-core platforms. For programming multi-core processors there are many representative parallel programming frameworks to simplify the parallelization of the computationally-intensive applications. These frame- works are Pthreads [11], OpenMP [4], Intel Cilk++ [1], Intel TBB [2], SWARM [9] and FastFlow [6], [7]. These frameworks based on a small set of extensions to the C programming language and involve a relatively simple com- pilation phase and potentially much more complex runtime system. We must note that there is a little related work on comparing different parallel programming frameworks on multi-core platform for several applications. For example, a 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing 1066-6192/12 $26.00 © 2012 IEEE DOI 10.1109/PDP.2013.20 77 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing 1066-6192/12 $26.00 © 2012 IEEE DOI 10.1109/PDP.2013.20 77