Computational Statistics and Data Analysis 56 (2012) 2206–2218
Contents lists available at SciVerse ScienceDirect
Computational Statistics and Data Analysis
journal homepage: www.elsevier.com/locate/csda
New approaches to nonparametric density estimation and selection of
smoothing parameters
Nina Golyandina
a
, Andrey Pepelyshev
a,b,∗
, Ansgar Steland
b
a
Faculty of Mathematics, St.Petersburg State University, Universitetskiy pr. 28, Petergof, St.Petersburg, 198504, Russia
b
Institute of Statistics, RWTH Aachen University, Wüllnerstr. 3, D-52056 Aachen, Germany
article info
Article history:
Received 9 March 2011
Received in revised form 18 November
2011
Accepted 24 December 2011
Available online 18 January 2012
Keywords:
Empirical distribution function
Time series smoothing
Singular Spectrum Analysis
Adaptive filter
Acceptance sampling plans
abstract
The application of Singular Spectrum Analysis (SSA) to the empirical distribution function
sampled at a grid of points spanning the range of the sample leads to a novel and promising
method for the computer-intensive nonparametric estimation of both the distribution
function and the density function. SSA yields a data-adaptive filter, whose length is a
parameter that controls the smoothness of the filtered series. A data-adaptive algorithm for
the automatic selection of a general smoothing parameter is introduced, which controls the
number of modes of the estimated density. Extensive computer simulations demonstrate
that the new automatic bandwidth selector improves on other popular methods for various
densities of interest. A general uniform error bound is proved for the proposed SSA estimate
of the distribution function, which ensures its uniform consistency. The simulation results
indicate that the SSA density estimate with the automatic choice of the filter length
outperforms the kernel density estimate in terms of the mean integrated squared error
and the Kolmogorov–Smirnov distance for various density shapes. Two applications to
problems arising in photovoltaic quality control and economic market research are studied
to illustrate the benefits of SSA estimation.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
For the fundamental problem of nonparametric density estimation various approaches have been proposed in the
literature. The most widely used approach is kernel smoothing dating back to Rosenblatt (1956) and Parzen (1962) and
thoroughly discussed in Silverman (1986) and Scott (1992). Histogram smoothing using splines is discussed in Boneva
et al. (1971). Another frequently used approach is orthogonal series estimation, see Efromovich (1999) and Bouezmarni
and Rombouts (2010) amongst others. In order to smooth cumulative distribution functions as well as quantile functions,
an approach using Bernstein polynomials has also been investigated, see Babu et al. (2002) and Cheng (1995). All these
methods incorporate in some way a parameter controlling the degree of smoothing.
A crucial issue for density estimation is how to select a smoothing parameter. For kernel smoothing, a popular method
of bandwidth selection is least-squares cross-validation (LSCV) studied in Rudemo (1982) and Bowman (1984). However,
this method may yield an under-estimated bandwidth, leading to under-smoothed density estimates. To correct this issue,
several so-called second generation or plug-in methods have been proposed, see Scott and Terrell (1987), Sheather and Jones
(1991), Hart and Yi (1998) and Chan et al. (2010). In these methods the optimal bandwidth depends on a functional based on
∗
Corresponding author at: Faculty of Mathematics, St.Petersburg State University, Universitetskiy pr. 28, Petergof, St.Petersburg, 198504, Russia.
Tel.: +49 241 809 4577; fax: +49 241 809 2130.
E-mail addresses: neg@math.spbu.ru (N. Golyandina), pepelyshev@stochastik.rwth-aachen.de (A. Pepelyshev), steland@stochastik.rwth-aachen.de
(A. Steland).
0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2011.12.019