STATISTICS IN MEDICINE Statist. Med. 2002; 21:2421–2436 (DOI: 10.1002/sim.1195) A weighted estimating equation for linear regression with missing covariate data Michael Parzen 1 , Stuart R. Lipsitz 2 , Joseph G. Ibrahim 3; ∗; † and Steven Lipshultz 4 1 University of Chicago; U.S.A. 2 Medical University of South Carolina; U.S.A. 3 Harvard School of Public Health and Dana-Farber Cancer Institute; Boston; U.S.A. 4 University of Rochester Medical Center; U.S.A. SUMMARY Linear regression is one of the most popular statistical techniques. In linear regression analysis, missing covariate data occur often. A recent approach to analyse such data is a weighted estimating equation. With weighted estimating equations, the contribution to the estimating equation from a complete obser- vation is weighted by the inverse ‘probability of being observed’. In this paper, we propose a weighted estimating equation in which we wrongly assume that the missing covariates are multivariate normal, but still produces consistent estimates as long as the probability of being observed is correctly modelled. In simulations, these weighted estimating equations appear to be highly ecient when compared to the most ecient weighted estimating equation as proposed by Robins et al. and Lipsitz et al. However, these weighted estimating equations, in which we wrongly assume that the missing covariates are mul- tivariate normal, are much less computationally intensive than the weighted estimating equations given by Lipsitz et al. We compare the weighted estimating equations proposed in this paper to the ecient weighted estimating equations via an example and a simulation study. We only consider missing data which are missing at random; non-ignorably missing data are not addressed in this paper. Copyright ? 2002 John Wiley & Sons, Ltd. KEY WORDS: missing at random; missing completely at random; missing data mechanism 1. INTRODUCTION Missing covariate data are a common occurrence in linear regression analysis. In this paper we consider a regression analysis of an outcome y on a vector x =(x 1 ;:::;x p ) ′ of covariates which are always observed, and a vector of covariates z =(z 1 ;:::;z m ) ′ , that either has all elements observed or all elements unobserved. A complete-case analysis, which excludes ∗ Correspondence to: Joseph Ibrahim, Department of Biostatistics, Harvard School of Public Health and Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, U.S.A. † E-mail: ibrahim@jimmy.harvard.edu Contract=grant sponsor: United States Institutes of Health; contract=grant numbers: HL69800, AHRQ 10871, HL53329, HL61769, CA70101 Received July 2000 Copyright ? 2002 John Wiley & Sons, Ltd. Accepted October 2001