Computational Statistics and Data Analysis 72 (2014) 13–29 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Unimodal density estimation using Bernstein polynomials Bradley C. Turnbull ∗ , Sujit K. Ghosh Department of Statistics, North Carolina State University, Raleigh, NC 27695, United States article info Article history: Received 7 February 2013 Received in revised form 18 October 2013 Accepted 19 October 2013 Available online 29 October 2013 Keywords: Bernstein polynomials Density estimation Mixture models Unimodal abstract The estimation of probability density functions is one of the fundamental aspects of any statistical inference. Many data analyses are based on an assumed family of parametric models, which are known to be unimodal (e.g., exponential family, etc.). Often a histogram suggests the unimodality of the underlying density function. Parametric assumptions, how- ever, may not be adequate for many inferential problems. A flexible class of mixture of Beta densities that are constrained to be unimodal is presented. It is shown that the estimation of the mixing weights, and the number of mixing components, can be accomplished using a weighted least squares criteria subject to a set of linear inequality constraints. The mix- ing weights of the Beta mixture are efficiently computed using quadratic programming techniques. Three criteria for selecting the number of mixing weights are presented and compared in a small simulation study. More extensive simulation studies are conducted to demonstrate the performance of the density estimates in terms of popular functional norms (e.g., L p norms). The true underlying densities are allowed to be unimodal symmetric and skewed, with finite, infinite or semi-finite supports. A code for an R function is provided which allows the user to input a data set and returns the estimated density, distribution, quantile, and random sample generating functions. © 2013 Elsevier B.V. All rights reserved. 1. Introduction Statistical inference is typically based on an assumed family of unimodal parametric models. Nonparametric density estimation is a popular alternative when that parametric assumption is not appropriate for modeling the density of the un- derlying population. The kernel method, developed by Parzen (1962), is one of the most popular methods of nonparametric density estimation. It is defined as the weighted average of kernel functions centered at the observed values. This average is taken with respect to the empirical cumulative distribution function (ECDF), F n (·), and is dependent on a smoothing or bandwidth parameter. If one believes the underlying population’s density is unimodal, there are two major advantages to including a unimodal- ity constraint in the density estimate. First, incorporating extra information about the shape of the density should improve the overall accuracy of the estimate. Second, extraneous modes, which may hinder the usefulness of the density estimate as a visual aid and exploratory tool, will be eliminated (Wolters, 2012). 1.1. Unimodal density estimation Silverman (1981) developed a bandwidth test for unimodality stemming from a nonparametric density estimate. Unfortunately, this test cannot be used to form the basis for a unimodal density estimate. The density estimate constructed by the test is smoothed in a global manner that is influenced solely by the features of the density located around the mode ∗ Corresponding author. Tel.: +1 9195152528. E-mail addresses: bcturnbu@ncsu.edu (B.C. Turnbull), sujit.ghosh@ncsu.edu (S.K. Ghosh). 0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.10.021