Missing data in a multi-item instrument were best handled by multiple imputation at the item score level Iris Eekhout a,b,c, * , Henrica C.W. de Vet a,b , Jos W.R. Twisk a,b,c , Jaap P.L. Brand d , Michiel R. de Boer c,e , Martijn W. Heymans a,b,c a Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands b EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BTAmsterdam, The Netherlands c Department of Methodology and Applied Biostatistics, Faculty of Earth and Life Sciences, Institute for Health Sciences, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands d Skyline Diagnostics, Marconistraat 16, 3029 AK Rotterdam, The Netherlands e Department of Public Health, University Medical Center Groningen, PO box 196, 9700 AD Groningen, The Netherlands Accepted 13 September 2013; Published online 2 December 2013 Abstract Objectives: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced tech- niques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more ad- vanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting: Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results: Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the sub- jects. Furthermore, when a large percentage of subjects had missing items (O25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion: We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data. Ó 2014 Elsevier Inc. All rights reserved. Keywords: Missing data; Multiple imputation; Multi-item questionnaire; Item imputation; Methods; Bias; Simulation 1. Introduction Missing data on multi-item instruments is a frequently seen problem in epidemiological and medical studies. Multi-item instruments can be used to measure, for exam- ple, quality of life, coping ability, or other psychological states. A multi-item instrument generally consists of several items that measure one construct [1], for example, the Pain Coping Inventory assesses active coping skills of people with pain complaints by 12 items [2]. Missing data on these kinds of instruments can occur as missing item scores, when several items are not completed or as missing data in total scores when the entire instrument is not filled out. Furthermore, missing item scores impair the calculation of the total score, which can lead to missing total scores as well. For missing data in item and total scores, different missing data-handling methods are available, with complete-case analysis (CCA) as the most frequently used method [3]. In general, CCA tends to perform well under the strict assumption that missing data are a completely random subsample of the data, in other words missing completely at random (MCAR) [4]. However, CCA reduces power caused by a decreased sample size. Single- imputation methods such as mean imputation of the total score and item mean imputation may be used to preserve the sample size by replacing the missing values by the mean score, but these methods reduce the variability in the data. Single stochastic regression imputation (SRI) uses Funding: This work was financially supported by EMGO Institute of Health and Care Research. Conflict of interest: None. * Corresponding author. Department of Epidemiology and Biostatis- tics, VU University Medical Center, Room MF D439, Van der Boechorst- straat 7, 1081 BT Amsterdam, The Netherlands. Tel.: þ31-204446040; Fax: + 31 20 444 8181. E-mail address: i.eekhout@vumc.nl (I. Eekhout). 0895-4356/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2013.09.009 Journal of Clinical Epidemiology 67 (2014) 335e342