Computational Statistics and Data Analysis 56 (2012) 2206–2218 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda New approaches to nonparametric density estimation and selection of smoothing parameters Nina Golyandina a , Andrey Pepelyshev a,b, , Ansgar Steland b a Faculty of Mathematics, St.Petersburg State University, Universitetskiy pr. 28, Petergof, St.Petersburg, 198504, Russia b Institute of Statistics, RWTH Aachen University, Wüllnerstr. 3, D-52056 Aachen, Germany article info Article history: Received 9 March 2011 Received in revised form 18 November 2011 Accepted 24 December 2011 Available online 18 January 2012 Keywords: Empirical distribution function Time series smoothing Singular Spectrum Analysis Adaptive filter Acceptance sampling plans abstract The application of Singular Spectrum Analysis (SSA) to the empirical distribution function sampled at a grid of points spanning the range of the sample leads to a novel and promising method for the computer-intensive nonparametric estimation of both the distribution function and the density function. SSA yields a data-adaptive filter, whose length is a parameter that controls the smoothness of the filtered series. A data-adaptive algorithm for the automatic selection of a general smoothing parameter is introduced, which controls the number of modes of the estimated density. Extensive computer simulations demonstrate that the new automatic bandwidth selector improves on other popular methods for various densities of interest. A general uniform error bound is proved for the proposed SSA estimate of the distribution function, which ensures its uniform consistency. The simulation results indicate that the SSA density estimate with the automatic choice of the filter length outperforms the kernel density estimate in terms of the mean integrated squared error and the Kolmogorov–Smirnov distance for various density shapes. Two applications to problems arising in photovoltaic quality control and economic market research are studied to illustrate the benefits of SSA estimation. © 2011 Elsevier B.V. All rights reserved. 1. Introduction For the fundamental problem of nonparametric density estimation various approaches have been proposed in the literature. The most widely used approach is kernel smoothing dating back to Rosenblatt (1956) and Parzen (1962) and thoroughly discussed in Silverman (1986) and Scott (1992). Histogram smoothing using splines is discussed in Boneva et al. (1971). Another frequently used approach is orthogonal series estimation, see Efromovich (1999) and Bouezmarni and Rombouts (2010) amongst others. In order to smooth cumulative distribution functions as well as quantile functions, an approach using Bernstein polynomials has also been investigated, see Babu et al. (2002) and Cheng (1995). All these methods incorporate in some way a parameter controlling the degree of smoothing. A crucial issue for density estimation is how to select a smoothing parameter. For kernel smoothing, a popular method of bandwidth selection is least-squares cross-validation (LSCV) studied in Rudemo (1982) and Bowman (1984). However, this method may yield an under-estimated bandwidth, leading to under-smoothed density estimates. To correct this issue, several so-called second generation or plug-in methods have been proposed, see Scott and Terrell (1987), Sheather and Jones (1991), Hart and Yi (1998) and Chan et al. (2010). In these methods the optimal bandwidth depends on a functional based on Corresponding author at: Faculty of Mathematics, St.Petersburg State University, Universitetskiy pr. 28, Petergof, St.Petersburg, 198504, Russia. Tel.: +49 241 809 4577; fax: +49 241 809 2130. E-mail addresses: neg@math.spbu.ru (N. Golyandina), pepelyshev@stochastik.rwth-aachen.de (A. Pepelyshev), steland@stochastik.rwth-aachen.de (A. Steland). 0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.12.019