Lifetime Data Analysis, 7, 111–123, 2001 c 2001 Kluwer Academic Publishers. Printed in The Netherlands. A Multiple Imputation Approach to Linear Regression with Clustered Censored Data WEI PAN weip@biostat.umn.edu Division of Biostatistics, School of Public Health, A460 Mayo Building, University of Minnesota, Minneapolis, MN 55455 JOHN E. CONNETT john-c@biostat.umn.edu Division of Biostatistics, School of Public Health, A460 Mayo Building, University of Minnesota, Minneapolis, MN 55455 Received June 24, 1999; Revised January 10, 2000; Accepted July 14, 2000 Abstract. We extend Wei and Tanner’s (1991) multiple imputation approach in semi-parametric linear regression for univariate censored data to clustered censored data. The main idea is to iterate the following two steps: 1) using the data augmentation to impute for censored failure times; 2) fitting a linear model with imputed complete data, which takes into consideration of clustering among failure times. In particular, we propose using the generalized estimating equations (GEE) or a linear mixed-effects model to implement the second step. Through simulation studies our proposal compares favorably to the independence approach (Lee et al., 1993), which ignores the within-cluster correlation in estimating the regression coefficient. Our proposal is easy to implement by using existing softwares. Keywords: asymptotic normal data augmentation, Buckley-James method, GEE, generalized least squares, mixed-effects model, Poor Man’s data augmentation 1. Introduction We consider the situation where the failure times are comprised of a large number of small and independent clusters or groups. Within each cluster, the failure times may be correlated. To adjust for the effects of some observed covariates, we use the Accelerated Failure Time (AFT) model. The AFT model is an important alternative to the Cox proportional hazards model. For example, Wei (1992) argued that the AFT is easier to interpret. The AFT model specifies that the logarithm (or more generally any monotone transformation) of a failure time is linearly related with observed covariates. Hence, the regression analysis based on the AFT model is also called linear regression. Of course, the presence of censoring greatly complicates the analysis. For linear regression with univariate failure time data, there have been many approaches proposed. Among them two of the most popular are the Buckley-James method (Buckley and James, 1979; Lai and Ying, 1991) and the one based on the linear rank statistics (Louis, 1981; Tsiatis, 1990). Wei and Tanner (1991) proposed a multiple imputation approach using two data augmentation schemes. On the other hand, for clustered multivariate failure time data Lee et al. (1993) presented an independence approach, where the correlation of the failure times within a cluster is ignored in estimating the regression coefficient. The independence approach is likely not efficient. Considering the popularity of the generalized