Biometrics 66, 1209–1219 December 2010 DOI: 10.1111/j.1541-0420.2009.01382.x Correction for Covariate Measurement Error in Nonparametric Longitudinal Regression D. Rummel, T. Augustin, and H. K¨ uchenhoff Department of Statistics, Ludwig-Maximilians-University Munich, Ludwigstraße 33, 80539 Munich, Germany email: thomas@stat.uni-muenchen.de Summary. We introduce a correction for covariate measurement error in nonparametric regression applied to longitudinal binary data arising from a study on human sleep. The data have been surveyed to investigate the association of some hormonal levels and the probability of being asleep. The hormonal effect is modeled flexibly while we account for the error- prone measurement of its concentration in the blood and the longitudinal character of the data. We present a fully Bayesian treatment utilizing Markov chain Monte Carlo inference techniques, and also introduce block updating to improve sampling and computational performance in the binary case. Our model is partly inspired by the relevance vector machine with radial basis functions, where usually very few basis functions are automatically selected for fitting the data. In the proposed approach, we implement such data-driven complexity regulation by adopting the idea of Bayesian model averaging. Besides the general theory and the detailed sampling scheme, we also provide a simulation study for the Gaussian and the binary cases by comparing our method to the naive analysis ignoring measurement error. The results demonstrate a clear gain when using the proposed correction method, particularly for the Gaussian case with medium and large measurement error variances, even if the covariate model is misspecified. Key words: Bayesian methods; Covariate measurement error; Human sleep data; Nonparametric regression of longitudinal data; Relevance vector machine. 1. Introduction Covariate measurement error has been perceived as an impor- tant issue in many areas of application (see, e.g., Gustafson, 2003, and Carroll et al., 2006), for an extensive discussion), demonstrating that statistical analysis ignoring such inherent error yields invalid results. The topic has generated major re- search interest in the field of medicine and epidemiology cf., e.g., Willet (1998) where, for instance, individual exposure to certain radiation or nutritional habits of study participants are recorded and the influence on disease is investigated. The motivation for our proposed method arises from data sur- veyed in a sleep laboratory, similar to the data analyzed by Yassouridis et al. (1999). In order to study the determinants of sleep, hormonal levels and the sleeping state for 33 par- ticipants were collected every 20 minutes during the night. For the data analysis, we want to use a model specifying the hormonal effect in a nonparametric way while accounting for the error-prone measurement of the hormonal levels and the longitudinal nature of the data. However, nonparametric regression allowing for covariate measurement error is a difficult task even for cross-sectional data: Carroll, Maca, and Ruppert (1999) developed two esti- mators that are approximately consistent adopting the flex- ible, yet parametric, method of regression splines. Besides a simulation-based approach, they proposed a so-called struc- tural approach to regression splines, where they adopt a modification of the original polynomial basis functions to account for the inherent measurement error. However, when estimating the smoothing parameter via the generalized cross- validation method they encountered the problem of under- smoothing, which is particularly severe in the presence of measurement error, because cross-validation necessarily has to rely on the error-prone observations only. Although Car- roll et al. (1999) suggested a mean-squared error (MSE) es- timation procedure to mitigate this problem, Rummel (2004) performed the smoothing parameter estimation in a Bayesian context: he adopted the flexible regression model as used in the relevance vector machine presented by Tipping (2001). This method utilizes radial basis functions and places a prior distribution on each coefficient of a basis function, which means having an individual smoothing parameter for each basis. The smoothing parameters are estimated via opti- mization of the marginal likelihood and basis functions are excluded from the model if the corresponding smoothing pa- rameter gets too large. Similar to the approach by Carroll et al. (1999), Rummel (2004) modified the radial basis func- tions to account for the mismeasured covariate, however, us- ing the relevance vector machine model so that undesirable undersmoothing does not occur. Berry, Carroll, and Ruppert (2002) proposed a fully Bayesian method, where they start, e.g., from a regression splines model and sample values of the true but latent covariate ξ together with all other un- known parameters in a Markov chain Monte Carlo (MCMC) algorithm. Using these sampled ξ ’s allows for error corrected estimates of all other parameters including the smoothing pa- rameter and thus overcomes the problem encountered earlier by Carroll et al. (1999). For the analysis of our data from the sleep laboratory, as well as for many other survey data, e.g., in the field of epidemi- ology and social sciences, the extension of the cross-sectional C 2010, The International Biometric Society 1209