DENSITY ESTIMATION FOR CLUSTERED DATA Robert V. Breunig Department of Statistics and Econometrics, The Australian National University, Canberra ACT 0200, Australia ABSTRACT The commonly used survey technique of clustering introduces dependence into sample data. Such data is frequently used in economic analysis, though the dependence induced by the sample structure of the data is often ignored. In this paper, the effect of clustering on the non-parametric, kernel estimate of the density, f (x), is examined. The window width commonly used for density estimation for the case of i.i.d. data is shown to no longer be optimal. A new optimal bandwidth using a higher-order kernel is proposed and is shown to give a smaller integrated mean squared error than two window widths which are widely used for the case of i.i.d. data. Several illustrations from simulation are provided. Key Words: Bandwidth choice; Cluster sampling; Dependent data; Kernel density estimation. JEL Classification: C14, C42. 1. INTRODUCTION The technique of non-parametric density estimation using kernel methods for data which is independently and identically distributed (i.i.d.) is well-developed. It is not clear however, how these results are affected when the i.i.d. assumption is violated, although some work has examined density estimation under weakly dependent time series observations. (See, for example, Hall, Lahiri, and Polzehl (1995) and Herrmann, Gasser, and Kneip (1992). In this paper, I consider a ECONOMETRIC REVIEWS, 20(3), 353–367 (2001) 353 Copyright # 2001 by Marcel Dekker, Inc. www.dekker.com Downloaded By: [Australian National University Library] At: 11:07 12 April 2011