Implicit Variational Inference: the Parameter and the Predictor Space Yann Pequignot * Mathieu Alain Patrick Dallaire Alireza Yeganehparast Pascal Germain Josée Desharnais François Laviolette Université Laval, Québec, Canada October 27, 2020 Having access to accurate confidence levels along with the predictions allows to determine whether making a decision is worth the risk. Under the Bayesian paradigm, the posterior distribution over parameters is used to capture model uncertainty, a valuable information that can be translated into predictive uncertainty. However, computing the posterior distribution for high capacity predictors, such as neural networks, is generally intractable, making approximate methods such as variational inference a promising alternative. While most meth- ods perform inference in the space of parameters, we explore the benefits of carrying inference directly in the space of predictors. Relying on a family of distributions given by a deep generative neural net- work, we present two ways of carrying variational inference: one in parameter space, one in predictor space. Importantly, the latter requires us to choose a distribution of inputs, therefore allowing us at the same time to explicitly address the question of out- of-distribution uncertainty. We explore from various perspectives the implications of working in the pre- dictor space induced by neural networks as opposed to the parameter space, focusing mainly on the qual- ity of uncertainty estimation for data lying outside of the training distribution. We compare posterior approximations obtained with these two methods to several standard methods and present results show- ing that variational approximations learned in the predictor space distinguish themselves positively from those trained in the parameter space. * Email: yann-batiste.pequignot.1@ulaval.ca 1. INTRODUCTION As more and more industries are adopting artifi- cial intelligence technologies, the requirements sur- rounding the development and deployment of AI- based systems are also evolving. In the medical sector and in aerospace for instance, several excit- ing applications have been developed, but will not make it to the real world until guarantees are ob- tained that critical mistakes can be avoided or at least safely controlled (Bhattacharyya et al., 2015; Begoli et al., 2019). Quantification of uncertainty is one important mechanism to support guaran- tees, therefore contributing to the safety of a sys- tem (Shafaei et al., 2018), as long as it is accurate and exhaustive. Of particular interest is the ability of a system to properly quantify uncertainty on out-of-distribution (OOD) instances, that is when predicting on instances different from the observa- tions it learns from. Bayesian statistical tools are a common way to tackle this challenge (MacKay, 1992; Neal, 1996). The uncertainty over a model parameters given a training set is captured within a conditional proba- bility distribution often referred to as the posterior distribution. This approach is commonly extended to neural network algorithms by expressing the pos- terior distribution over the network parameters (i.e. its weights and biases). However, in most cases, the inference of this distribution is difficult to carry out due to the non-linearity and the typically large parameter space used in neural networks. Mean- while, variational methods have gradually emerged as a time-efficient solution for obtaining an approx- 1 arXiv:2010.12995v1 [cs.LG] 24 Oct 2020