JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, VOL. , NO. , –, Theory and Methods
http://dx.doi.org/./..
Optimal Model Averaging Estimation for Generalized Linear Models and Generalized
Linear Mixed-Effects Models
Xinyu Zhang
a
, Dalei Yu
b
, Guohua Zou
a
, and Hua Liang
c
a
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Science, Beijing, China;
b
Statistics and
Mathematics College, Yunnan University of Finance and Economics, Kunming, China;
c
Department of Statistics, Geoge Washington University,
Washington, DC, USA
ARTICLE HISTORY
Received August
Revised October
KEYWORDS
Asymptotic optimality;
Kullback–Leibler loss;
Misspecification; Penalized
generalized weighted least
squares (PGWLS); Prediction
accuracy
ABSTRACT
Considering model averaging estimation in generalized linear models, we propose a weight choice crite-
rion based on the Kullback–Leibler (KL) loss with a penalty term. This criterion is diferent from that for con-
tinuous observations in principle, but reduces to the Mallows criterion in the situation. We prove that the
corresponding model averaging estimator is asymptotically optimal under certain assumptions. We further
extend our concern to the generalized linear mixed-efects model framework and establish associated the-
ory. Numerical experiments illustrate that the proposed method is promising.
1. Introduction
Model averaging, unlike most variable selection studies which
focus on identifying important predictors, aims to the predic-
tion accuracy given several predictors (Ando and Li 2014).
Using variable selection (or model selection), we end up putting
all our inferential eggs in one unevenly woven basket (Long-
ford 2005), while model averaging is a smoothed extension of
model selection for substantially reducing risk relative to selec-
tion (Hansen 2014). Bayesian model averaging (BMA) has long
been a popular statistical technique; see Hoeting et al. (1999) for
a comprehensive review of this literature. Unlike BMA, where
models are usually weighted by their posterior model probabil-
ities, the current article focuses on the method of determining
weights from frequentist perspective.
Over the past decade, there has been a substantial amount of
interest in development of asymptotically optimal model aver-
aging procedure. Various strategies have been proposed. See,
for example, Mallows model averaging (Hansen 2007), opti-
mal mean squared error averaging (Liang et al. 2011), jackknife
model averaging (Hansen and Racine 2012), heteroskedasticity-
robust C
p
(Liu and Okui 2013), and optimal model averaging
for linear mixed-efects models (Zhang, Zou, and Liang 2014).
These methods are mainly developed for the linear framework.
As far as we know, there is little work on developing opti-
mal model averaging methods under generalized linear (mixed-
efects) models.
Our attempt in this article is at developing an optimal model
averaging method for generalized linear models (GLMs), for
which we immediately face two challenges in contrast to the case
in the linear setting. First, we need to develop a proper weight
choice criterion. In the existing literature on optimal model aver-
aging for the linear setting, the weight choice criterion is gen-
erally developed after one obtains an unbiased estimator of the
CONTACT Hua Liang hliang@gwu.edu Department of Statistics, George Washington University, Washington, DC , and Special Term Professor at School of
Statistics, Capital University of Economics and Business, Beijing, China.
squared prediction risk. This way is hard to manipulate, if not
impossible, because of the generality of the GLM link function.
Second, we expect to prove that the resultant model averaging
estimator is asymptotically optimal, that is, it minimizes predic-
tive error in the large sample case. However, the proof of the
asymptotic optimality is much more difcult comparing to the
situation of the linear setting, where a commonly used tool is
Theorem 2 of Whittle (1960). This theorem cannot be applied
anymore because the model averaging estimator is not a linear
function of response variables.
To address the frst challenge, we will use a plug-in estima-
tor of the Kullback–Leibler (KL) loss plus a penalty term as the
weight choice criterion, which is equivalent to penalizing the
negative log-likelihood. It is interesting to note that this criterion
reduces to the Mallows criterion (Hansen 2007) for the normal
distribution situation. For the second one, we alternatively use
the theory for consistency of estimators in misspecifed models,
developed by White (1982) (see Equation (6) in Section 3 for the
details) to establish the asymptotic optimality in the view that all
candidate models can be misspecifed. We assume the number
of candidate models to be fnite. The asymptotic optimality will
be established for the dimension of covariates being either fxed
or diverging.
Under the framework of conditional inference (Jiang 1999),
we further extend our work to generalized linear mixed-efects
models (GLMMs) for analyzing the nonnormal and noninde-
pendent data. The notion of conditional Kullback–Leibler diver-
gence is introduced and the corresponding weight choice cri-
terion is developed. Then, the asymptotic optimality is investi-
gated accordingly.
The remainder of this article is organized as follows. Section
2 introduces the model averaging estimation for GLMs and
proposes a weight choice method by considering the KL loss.
© American Statistical Association