Journal of Statistical Planning and Inference 138 (2008) 552 – 567 www.elsevier.com/locate/jspi Efficient mean estimation in log-normal linear models Haipeng Shen, Zhengyuan Zhu Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Received 6 April 2006; received in revised form 28 September 2006; accepted 13 October 2006 Available online 14 March 2007 Abstract Log-normal linear models are widely used in applications, and many times it is of interest to predict the response variable or to estimate the mean of the response variable at the original scale for a new set of covariate values. In this paper we consider the problem of efficient estimation of the conditional mean of the response variable at the original scale for log-normal linear models. Several existing estimators are reviewed first, including the maximum likelihood (ML) estimator, the restricted ML (REML) estimator, the uniformly minimum variance unbiased (UMVU) estimator, and a bias-corrected REML estimator. We then propose two estimators that minimize the asymptotic mean squared error and the asymptotic bias, respectively. A parametric bootstrap procedure is also described to obtain confidence intervals for the proposed estimators. Both the new estimators and the bootstrap procedure are very easy to implement. Comparisons of the estimators using simulation studies suggest that our estimators perform better than the existing ones, and the bootstrap procedure yields confidence intervals with good coverage properties. A real application of estimating the mean sediment discharge is used to illustrate the methodology. © 2007 Elsevier B.V.All rights reserved. Keywords: Maximum likelihood; Parametric bootstrap; Mean squared error; Uniformly minimum variance unbiased; Sediment discharge 1. Introduction The prevalence of log-normality has been reported in a wide range of applications from mining (Marcotte and Groleau, 1997), insurance reserves estimation (Doray, 1996), water quality control (Gilliom and Helsel, 1986), to air pollution concentration monitoring (Holland et al., 2000) and sediment discharge estimation (Cohn, 1995; Elliott and Anders, 2004), to name just a few. Log-normal linear models are often used in these applications, in which linear models are fitted to logarithmic transformed response variables. To fix ideas, let Z = (Z 1 ,...,Z n ) T be the log-normal response vector, and x i = (1,x i 1 ,...,x ip ) T be the covariate vector for observation i. A log-normal linear model assumes that Y = log(Z) = X+ , (1) where X = (x 1 ,...,x n ) T , = ( 0 , 1 ,..., p ) T , and = ( 1 ,..., n ) T with i i.i.d. N(0, 2 ). In many cases, for a new set of covariate values x 0 , one is interested in predicting the response variable at the original scale, Z 0 = exp(x T 0 + 0 ), Corresponding author. Tel./fax: +1 919 843 2431. E-mail address: zhuz@email.unc.edu (Z. Zhu). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2006.10.016