doi: 10.1111/j.1467-9469.2006.00506.x Board of the Foundation of the Scandinavian Journal of Statistics 2006. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 451–462, 2006 A Simple Estimator of Error Correlation in Non-parametric Regression Models BYEONG U. PARK and YOUNG KYUNG LEE Department of Statistics, Seoul National University TAE YOON KIM and CHEOLYONG PARK Department of Statistics, Keimyung University ABSTRACT. It is well known that major strength of non-parametric regression function estima- tion breaks down when correlated errors exist in the data. Positively (negatively) correlated errors tend to produce undersmoothing (oversmoothing). Several remedies have been proposed in the con- text of bandwidth selection problem, but they are hard to implement without prior knowledge of error correlations. In this paper we propose a simple estimator of error correlation which is ready to implement and reports a reasonably good performance. Key words: bandwidth selection, bimodal kernel, correlated errors, kernel regression, residuals 1. Introduction Non-parametric regression function estimation typically assumes that the responses Y i obey the model Y i = m(x i ) + ǫ i , i = 1, ... , n, where x i are the design points, m is the unknown mean function, and ǫ i s are the i.i.d. errors. Major strength of this model is that no specific or parametric form of m is assumed in advance, which gives data itself more chance to speak of m. However, many researchers have noticed that correlated errors, if exist in the data, might complicate the model and eventually break down its major strength significantly. What’s behind this is that data with correlated errors tend to twist the true m severely and disable the model easily. Positively correlated errors produce an image of a less smooth curve, while negatively correlated errors generate a smoother image. It is well known that the difficulty is more serious for positively correlated errors, see Altman (1990) and Hart (1991), for example. Remedies to the correlated error problem in non-parametric regression have been researched and offered in the context of bandwidth selection, mainly because the bandwidth in non-parametric regression determines the shape or the degree of smoothness of an esti- mate of m. We refer to Opsomer et al. (2001) for an excellent review of this. This problem, seemingly well understood and equipped with several remedies, suffers from a major obstacle that the errors are not observable. In other words, although severity of the problem is well understood and its remedies are available, it is hard to diagnose the problem with a given data. Recently, Kim et al. (2004) showed that detection of the correlated errors is possible for non-parametric regression with a given data. But the problem still persists when one wants to implement the remedies, simply because they usually require the strength of correlation among the errors to be known or estimated in advance. See the discussion in the next para- graph. The main aim of this paper is to provide a simple estimator of the unknown amount of correlation among the errors. The standard techniques for bandwidth selection, such as cross-validation and general- ized cross-validation, are known to break down when the errors are significantly positively