Biometrics 66, 415–425 June 2010 DOI: 10.1111/j.1541-0420.2009.01299.x Regression Analysis with a Misclassified Covariate from a Current Status Observation Scheme Leilei Zeng, 1, ∗ Richard J. Cook, 2 and Theodore E. Warkentin 3 1 Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada 2 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada 3 Department of Medicine, Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8L 2X2, Canada ∗ email: lzeng@sfu.ca Summary. Naive use of misclassified covariates leads to inconsistent estimators of covariate effects in regression models. A variety of methods have been proposed to address this problem including likelihood, pseudo-likelihood, estimating equation methods, and Bayesian methods, with all of these methods typically requiring either internal or external validation samples or replication studies. We consider a problem arising from a series of orthopedic studies in which interest lies in examining the effect of a short-term serological response and other covariates on the risk of developing a longer term thrombotic condition called deep vein thrombosis. The serological response is an indicator of whether the patient developed antibodies following exposure to an antithrombotic drug, but the seroconversion status of patients is only available at the time of a blood sample taken upon the discharge from hospital. The seroconversion time is therefore subject to a current status observation scheme, or Case I interval censoring, and subjects tested before seroconversion are misclassified as nonseroconverters. We develop a likelihood-based approach for fitting regression models that accounts for misclassification of the seroconversion status due to early testing using parametric and nonparametric estimates of the seroconversion time distribution. The method is shown to reduce the bias resulting from naive analyses in simulation studies and an application to the data from the orthopedic studies provides further illustration. Key words: Current status observation; EM algorithm; Misclassified covariate; Nonparametric estimation; Regression model. 1. Introduction Accurate and reliable measurement of prognostic variables is often a considerable challenge. It is well known that naive use of mismeasured covariates in regression models leads to inconsistent estimators of covariate effects (Carroll, Ruppert, and Stefanski, 1995; Carroll, 1998; Yi and Cook, 1998), and so there has been considerable work on the development of ef- ficient and robust methods for obtaining estimators with bet- ter properties. These methods may be based on likelihoods (e.g., Schafer and Purdy, 1996), pseudo-likelihoods (e.g., Carroll, Gail, and Lubin, 1993; Hanfelt and Liang, 1997; Lawless, Kalbfleisch, and Wild, 1999), Bayesian methods (Gustafson, 2004), or estimating functions (Nakamura, 1990; Pepe and Fleming, 1991). All of these methods typically re- quire either internal or external validation samples or repli- cation studies in order to estimate the parameters of the mis- classification distribution (Carroll et al., 2006). The problem of current interest arose in a collaborative thrombosis research program. Prophylaxis with antithrom- botic heparin-based therapies is known to be highly effective in reducing the risk of thrombosis and is now standard prac- tice in orthopedic surgery (White et al., 1998). We consider a problem arising from secondary analyses of data from four multicenter randomized trials involving patients undergoing orthopedic surgery (Bauer et al., 2001; Eriksson et al., 2001; Lassen et al., 2002; Turpie et al., 2002) designed to investigate the relative performance of enoxaparin and fondaparinux on the risk of deep vein thrombosis (DVT) following hip or knee replacement. Some patients undergoing orthopedic surgery and exposed to antithrombotic drugs experience serological responses and current interest lies in understanding the im- pact of such a serological response on the risk of DVT and other thrombotic events. Patients are seronegative before surgery, and among sero- converters, antibodies are known to develop over a period of approximately 10 days following surgery. If all patients were tested after this 10-day period then seroconversion status would be known exactly. However, patients provide a blood sample at the time of discharge from hospital and discharge times vary considerably across patients. The seroconversion time is said to be subject to a “current status” observation scheme, or subject to type I interval censoring (Sun, 2006). While the seroconversion time is not of interest, it is nec- essary to take into account the timing of the blood sample to accommodate the fact that some patients may have been tested before they developed the antibody response; negative blood tests in this case represent false negative classifications of the true seroconversion status. There exists an extensive literature on methods for es- timating survivor functions of failure times under current C 2009, The International Biometric Society 415