Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example Geert J.M.G. van der Heijden a,b, * , A. Rogier T. Donders a,c,d , Theo Stijnen e , Karel G.M. Moons a a Julius Center for Health Sciences and Primary Care, University Medical Center, P.O. Box 80035, 3508 GAUtrecht, The Netherlands b Heart Lung Centre Utrecht, University Medical Center, Utrecht, The Netherlands c Department of Biostatistics, Utrecht University, Utrecht, The Netherlands d Department of Innovation Studies, Copernicus Institute, Utrecht University, Utrecht, The Netherlands e Department of Epidemiology and Biostatistics, Erasmus University Medical School, Rotterdam, The Netherlands Accepted 10 January 2006 Abstract Background and Objectives: To illustrate the effects of different methods for handling missing datadcomplete case analysis, missing-in- dicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)din the context of multivariable diag- nostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. Methods: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic pre- dictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. Results: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. Conclusion: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Missing data; Complete case analysis; Single imputation; Multiple imputation; Indicator method; Bias; Precision 1. Introduction Missing observations are frequently encountered and oc- cur in all types of studies, no matter how strictly designed or how hard investigators try to prevent them. In diagnostic studies, as in other type of epidemiological studies includ- ing clinical trials and repeated measurement surveys, miss- ing data often occur in a selective pattern. Patient referral for subsequent measurements, here diagnostic procedures, is commonly based on prior measurements, here prior test results, certainly when data are obtained from routine care. In diagnostic research this leads to the well-known referral (verification or work-up) bias [1]. Consider, for example, a study among children with neck stiffness. The aim was to quantify which diagnostic test results from patient his- tory and physical examination predict the presence or ab- sence of bacterial meningitis and which blood tests, e.g., leukocyte count or c-reactive protein level, have additional predictive value [2]. Patients who presented with severe signs, such as convulsions and high fever, were more often and quicker referred for additional blood testing, before full completion of patient history and physical examination. On the other hand, for patients presenting with very mild or no symptoms, additional tests were less often done because the physician already ruled out a serious disease early in the di- agnostic process. Accordingly, the sample of study subjects with complete data did not represent the group as a whole, and subjects with missing data carried important informa- tion on the associations studied. * Corresponding author. Tel.: þ31-30-2509377; fax: þ31-30-2505485. E-mail address: g.vanderheijden@umcutrecht.nl (G.J.M.G. van der Heijden). 0895-4356/06/$ e see front matter Ó 2006 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2006.01.015 Journal of Clinical Epidemiology 59 (2006) 1102e1109