Biometrika (2009), 96, 2, pp. 383–398 doi: 10.1093/biomet/asp015 C 2009 Biometrika Trust Printed in Great Britain Nonparametric additive regression for repeatedly measured data BY RAYMOND J. CARROLL Department of Statistics, Texas A&M University, College Station, Texas 77843, U.S.A. carroll@stat.tamu.edu ARNAB MAITY Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, U.S.A. amaity@hsph.harvard.edu ENNO MAMMEN AND KYUSANG YU Department of Economics, University of Mannheim, L 7, 3-5, 68131 Mannheim, Germany emammen@rumms.uni-mannheim.de kyusangu@yahoo.co.kr SUMMARY We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. Some key words: Additive model; Generalized least square; Nonparametric regression; Repeated measure; Smooth backfitting. 1. INTRODUCTION We consider efficient estimation of an additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is a considerable literature in the scalar covariate case, see below, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. There has been much interest in the simplest version of this problem. Suppose that there are i = 1,..., n individuals, and j = 1,..., J observations per individual. The responses are Y ij and the scalar predictors are X ij . A simple model says that given ( X i 1 ,..., X iJ ), Y ij = m true ( X ij ) + ǫ ij , cov(ǫ i ) = cov(ǫ i 1 ,...,ǫ iJ ) T = true , (1) where ǫ has mean zero.