Journal of Econometrics 184 (2015) 452–463 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Model averaging estimation of generalized linear models with imputed covariates Valentino Dardanoni a , Giuseppe De Luca a , Salvatore Modica a , Franco Peracchi b, a University of Palermo, Italy b University of Rome Tor Vergata and Einaudi Institute for Economics and Finance (EIEF), Italy article info Article history: Received 3 July 2013 Received in revised form 19 March 2014 Accepted 15 June 2014 Available online 24 June 2014 JEL classification: C11 C25 C35 C81 Keywords: Model averaging Bayesian averaging of maximum likelihood estimators Generalized linear models Missing covariates Generalized missing-indicator method SHARE abstract We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our approach. © 2014 Elsevier B.V. All rights reserved. 1. Introduction In this paper we address the problem of estimating general- ized linear models (GLMs) when the outcome of interest is always observed, some covariate values are missing, and imputations are available to fill-in the missing values. This situation is becoming quite common, as public-use data files increasingly include impu- tations of key variables affected by item nonresponse. The focus of this paper is on how to make use of the available imputations, not on methods to impute the missing values. Two standard approaches to the problem of missing covari- ate values are complete-case analysis and the fill-in approach. The first drops all the observations with missing values ignoring the imputations altogether, while the second fills-in the missing values with the available imputations without distinguishing be- tween observed and imputed values. Under certain conditions on the missing-data mechanism and the imputation model, the choice Corresponding author. Tel.: +39 06 7259 5934; fax: +39 06 2040 219. E-mail address: franco.peracchi@uniroma2.it (F. Peracchi). between these two approaches generates a trade-off between bias and precision in the estimation of the parameters of interest. When the complete cases are few the loss of precision may be sub- stantial, but just filling-in the missing values with the imputations may lead to bias when the imputation model is either incorrectly specified or uncongenial in the sense of Meng (1994), that is, the imputation model is more restrictive than the model used to ana- lyze the filled-in data. Validity of the assumptions behind the fill-in approach is often taken for granted, so this bias-precision trade-off is usually ignored. However, when imputations are provided by an external source, the congeniality assumption may fail because the two models are based on different parametric assumptions or they condition on different sets of covariates. The estimates from the fill-in approach may therefore be inconsistent, especially in the case of nonlinear estimators. Using the generalized missing-indicator approach originally proposed for linear regression by Dardanoni et al. (2011), we trans- form the bias-precision trade-off between complete-case analysis and the fill-in approach into a problem of model uncertainty re- garding which covariates should be dropped from an augmented GLM, or ‘grand model’, which includes two subsets of regressors: http://dx.doi.org/10.1016/j.jeconom.2014.06.002 0304-4076/© 2014 Elsevier B.V. All rights reserved.