The Variational Bayes Method For Inverse Regression Problems With an Application To The Palaeoclimate Reconstruction Richa Vatsa and Simon Wilson Dept. of Statistics, Trinity College Dublin, Dublin, Ireland vatsar@tcd.ie and simon.wilson@tcd.ie Abstract The palaeoclimate reconstruction problem is described as an example of inverse regression problems. In the reconstruction problem, past climate is inferred using pollen data. Modern data is used to build a regression model of how pollen responds to climate. The inverse problem is to infer climate from data on ancient pollen prevalence. The inverse inference presents a challenging and computationally intensive problem. It is demonstrated that Variational Bayes (VB), that assumes conditional independence, provides quick solutions to the reconstruction problem. The advantage of the use of the VB method is that many more climate variables can be included in the estimation without imposing a huge burden to the reconstruction problem. We explore the accuracy of the VB method, and comment on its usefulness more generally in inverse inference problems. Keywords: Inverse problem, palaeoclimate reconstruction, Variational Bayes method. 1 Introduction Inverse problems form an important class of statistical inference problems, from geology to medical image processing and financial mathematics. The definition of an inverse problem is subject to some interpretation but broadly it refers to problems where we have indirect observations of an object (a function) that we want to reconstruct. From a mathematical point of view, this usually corresponds to the inversion of some operator. A particular case of an inverse problem, and the motivating example for this paper, is that of ancient climate reconstruction from so-called ’proxy’ data, such as ancient pollen that can be recovered from lake sediment. In this case, it is natural to model the response of pollen as a function of climate, and then fit this model through modern data on both pollen and climate. The inference task is to invert the fitted function using ancient pollen data in order to infer the climate. In common with many inverse problems, both the model fitting and inversion involve a consider- able computational burden. Bayesian approaches to this problem have used MCMC (Haslett et al., 2006), importance sampling for cross validation (Bhattacharya, 2004) and functional approxima- tions (Salter-Townshend, 2009). All of these approaches are not without limitations. The MCMC approach suffered from poor mixing. The approach of Salter-Townshend (2009) used the INLA method of Rue. et al. (2009), and so was restricted to models with a limited number of parameters. They also split the problem into two distinct stages: fitting to modern data, and inversion. 1