JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –, Theory and Methods http://dx.doi.org/./.. Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models Xinyu Zhang a , Dalei Yu b , Guohua Zou a , and Hua Liang c a Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Science, Beijing, China; b Statistics and Mathematics College, Yunnan University of Finance and Economics, Kunming, China; c Department of Statistics, Geoge Washington University, Washington, DC, USA ARTICLE HISTORY Received August  Revised October  KEYWORDS Asymptotic optimality; Kullback–Leibler loss; Misspecification; Penalized generalized weighted least squares (PGWLS); Prediction accuracy ABSTRACT Considering model averaging estimation in generalized linear models, we propose a weight choice crite- rion based on the Kullback–Leibler (KL) loss with a penalty term. This criterion is diferent from that for con- tinuous observations in principle, but reduces to the Mallows criterion in the situation. We prove that the corresponding model averaging estimator is asymptotically optimal under certain assumptions. We further extend our concern to the generalized linear mixed-efects model framework and establish associated the- ory. Numerical experiments illustrate that the proposed method is promising. 1. Introduction Model averaging, unlike most variable selection studies which focus on identifying important predictors, aims to the predic- tion accuracy given several predictors (Ando and Li 2014). Using variable selection (or model selection), we end up putting all our inferential eggs in one unevenly woven basket (Long- ford 2005), while model averaging is a smoothed extension of model selection for substantially reducing risk relative to selec- tion (Hansen 2014). Bayesian model averaging (BMA) has long been a popular statistical technique; see Hoeting et al. (1999) for a comprehensive review of this literature. Unlike BMA, where models are usually weighted by their posterior model probabil- ities, the current article focuses on the method of determining weights from frequentist perspective. Over the past decade, there has been a substantial amount of interest in development of asymptotically optimal model aver- aging procedure. Various strategies have been proposed. See, for example, Mallows model averaging (Hansen 2007), opti- mal mean squared error averaging (Liang et al. 2011), jackknife model averaging (Hansen and Racine 2012), heteroskedasticity- robust C p (Liu and Okui 2013), and optimal model averaging for linear mixed-efects models (Zhang, Zou, and Liang 2014). These methods are mainly developed for the linear framework. As far as we know, there is little work on developing opti- mal model averaging methods under generalized linear (mixed- efects) models. Our attempt in this article is at developing an optimal model averaging method for generalized linear models (GLMs), for which we immediately face two challenges in contrast to the case in the linear setting. First, we need to develop a proper weight choice criterion. In the existing literature on optimal model aver- aging for the linear setting, the weight choice criterion is gen- erally developed after one obtains an unbiased estimator of the CONTACT Hua Liang hliang@gwu.edu Department of Statistics, George Washington University, Washington, DC , and Special Term Professor at School of Statistics, Capital University of Economics and Business, Beijing, China. squared prediction risk. This way is hard to manipulate, if not impossible, because of the generality of the GLM link function. Second, we expect to prove that the resultant model averaging estimator is asymptotically optimal, that is, it minimizes predic- tive error in the large sample case. However, the proof of the asymptotic optimality is much more difcult comparing to the situation of the linear setting, where a commonly used tool is Theorem 2 of Whittle (1960). This theorem cannot be applied anymore because the model averaging estimator is not a linear function of response variables. To address the frst challenge, we will use a plug-in estima- tor of the Kullback–Leibler (KL) loss plus a penalty term as the weight choice criterion, which is equivalent to penalizing the negative log-likelihood. It is interesting to note that this criterion reduces to the Mallows criterion (Hansen 2007) for the normal distribution situation. For the second one, we alternatively use the theory for consistency of estimators in misspecifed models, developed by White (1982) (see Equation (6) in Section 3 for the details) to establish the asymptotic optimality in the view that all candidate models can be misspecifed. We assume the number of candidate models to be fnite. The asymptotic optimality will be established for the dimension of covariates being either fxed or diverging. Under the framework of conditional inference (Jiang 1999), we further extend our work to generalized linear mixed-efects models (GLMMs) for analyzing the nonnormal and noninde- pendent data. The notion of conditional Kullback–Leibler diver- gence is introduced and the corresponding weight choice cri- terion is developed. Then, the asymptotic optimality is investi- gated accordingly. The remainder of this article is organized as follows. Section 2 introduces the model averaging estimation for GLMs and proposes a weight choice method by considering the KL loss. ©  American Statistical Association