Journal of Econometrics 184 (2015) 452–463
Contents lists available at ScienceDirect
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Model averaging estimation of generalized linear models with
imputed covariates
Valentino Dardanoni
a
, Giuseppe De Luca
a
, Salvatore Modica
a
, Franco Peracchi
b,∗
a
University of Palermo, Italy
b
University of Rome Tor Vergata and Einaudi Institute for Economics and Finance (EIEF), Italy
article info
Article history:
Received 3 July 2013
Received in revised form
19 March 2014
Accepted 15 June 2014
Available online 24 June 2014
JEL classification:
C11
C25
C35
C81
Keywords:
Model averaging
Bayesian averaging of maximum likelihood
estimators
Generalized linear models
Missing covariates
Generalized missing-indicator method
SHARE
abstract
We address the problem of estimating generalized linear models when some covariate values are missing
but imputations are available to fill-in the missing values. This situation generates a bias-precision trade-
off in the estimation of the model parameters. Extending the generalized missing-indicator method
proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of
model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also
propose a block model averaging strategy that incorporates information on the missing-data patterns and
is computationally simple. An empirical application illustrates our approach.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction
In this paper we address the problem of estimating general-
ized linear models (GLMs) when the outcome of interest is always
observed, some covariate values are missing, and imputations are
available to fill-in the missing values. This situation is becoming
quite common, as public-use data files increasingly include impu-
tations of key variables affected by item nonresponse. The focus of
this paper is on how to make use of the available imputations, not
on methods to impute the missing values.
Two standard approaches to the problem of missing covari-
ate values are complete-case analysis and the fill-in approach.
The first drops all the observations with missing values ignoring
the imputations altogether, while the second fills-in the missing
values with the available imputations without distinguishing be-
tween observed and imputed values. Under certain conditions on
the missing-data mechanism and the imputation model, the choice
∗
Corresponding author. Tel.: +39 06 7259 5934; fax: +39 06 2040 219.
E-mail address: franco.peracchi@uniroma2.it (F. Peracchi).
between these two approaches generates a trade-off between
bias and precision in the estimation of the parameters of interest.
When the complete cases are few the loss of precision may be sub-
stantial, but just filling-in the missing values with the imputations
may lead to bias when the imputation model is either incorrectly
specified or uncongenial in the sense of Meng (1994), that is, the
imputation model is more restrictive than the model used to ana-
lyze the filled-in data. Validity of the assumptions behind the fill-in
approach is often taken for granted, so this bias-precision trade-off
is usually ignored. However, when imputations are provided by
an external source, the congeniality assumption may fail because
the two models are based on different parametric assumptions or
they condition on different sets of covariates. The estimates from
the fill-in approach may therefore be inconsistent, especially in the
case of nonlinear estimators.
Using the generalized missing-indicator approach originally
proposed for linear regression by Dardanoni et al. (2011), we trans-
form the bias-precision trade-off between complete-case analysis
and the fill-in approach into a problem of model uncertainty re-
garding which covariates should be dropped from an augmented
GLM, or ‘grand model’, which includes two subsets of regressors:
http://dx.doi.org/10.1016/j.jeconom.2014.06.002
0304-4076/© 2014 Elsevier B.V. All rights reserved.