Algorithms for Gaussian Bandwidth Selection in Kernel Density Estimators José Miguel Leiva Murillo and Antonio Artés Rodríguez Department of Signal Theory and Communications, Universidad Carlos III de Madrid E-mail: {leiva,antonio}@ieee.org. Abstract In this paper we study the classical statistical problem of choos- ing an appropriate bandwidth for Kernel Density Estimators. For the special case of Gaussian kernel, two algorithms are proposed for the spherical covariance matrix and for the general case, respec- tively. These methods avoid the unsatisfactory procedure of tuning the bandwidth while evaluating the likelihood, which is impractical with multivariate data in the general case. The convergence con- ditions are provided together with the algorithms proposed. We measure the accuracy of the models obtained by a set of classifica- tion experiments. 1 Introduction A Kernel Density Estimator (KDE) is a non-parametric Probability Density Func- tion (PDF) model that consists of a linear combination of kernel functions centered on the training data {x i } i=1,...,N , i.e.: ˆ p θ (x)= 1 N N i=1 k θ (x x i ) (1) where k θ (x) is the kernel function, which must be unitary, i.e.: k θ (x)dx =1 and x ∈R D . Although the KDEs are commonly considered as non-parametric models, the kernel function is characterized by a bandwidth that determines the accuracy of the model: ˆ p θ (x)=ˆ p(x|θ). Kernels too narrow of wide lead to overfitted or underfitted models, respectively. Classical bandwidth selection methods have mainly focused on the unidimensional case. In [1], some first and second generation methods are compilated. Some exam- ples of first generation criteria are the Mean Square Error (MSE), the Mean Inte- grated Squared Error (MISE), and the asymptotical MISE (AMISE) [1] [2]. Second generation methods include plug-in techniques and bootstrap methods. Kullback- Leibler divergence has also been considered [3]. We are interested in the Maximum-Likelihood (ML) criterion. Cross-validation allows us to apply the ML criterion so that a model built from N 1 samples is