Improving Continuous Normalizing Flows using a Multi-Resolution Framework Vikram Voleti 12 Chris Finlay 34 Adam Oberman 31 Christopher Pal 516 Abstract Recent work has shown that Continuous Normal- izing Flows (CNFs) can serve as generative mod- els of images with exact likelihood calculation and invertible generation/density estimation. In this work we introduce a Multi-Resolution vari- ant of such models (MRCNF). We introduce a transformation between resolutions that allows for no change in the log likelihood. We show that this approach yields comparable likelihood values for various image datasets, with improved performance at higher resolutions, with fewer pa- rameters, using only 1 GPU. 1. Introduction Reversible generative models derived through the use of the change of variables technique (Dinh et al., 2017; Kingma & Dhariwal, 2018; Ho et al., 2019; Yu et al., 2020) are growing in interest as generative models, because they en- able efficient density estimation, efficient sampling, and computation of exact likelihoods. A promising variation of the change-of-variable approach is based on the use of a continuous time variant of normalizing flows (Chen et al., 2018; Grathwohl et al., 2019), which uses an integral over continuous time dynamics to transform a base distribution into the model distribution, called Continuous Normalizing Flows (CNF). CNFs have been shown to be capable of mod- elling complex distributions such as those associated with images. While this new paradigm for the generative mod- elling of images is not as mature as Generative Adversarial Networks (GANs) (Goodfellow et al., 2016) or Variational Autoencoders (VAEs) (Kingma & Welling, 2013) in terms of the generated image quality, it is a promising direction of research. * Equal contribution 1 Mila 2 Universit ´ e de Montr ´ eal, Canada 3 McGill University, Canada 4 Deep Render 5 Polytechnique Montr ´ eal, Canada 6 Canada CIFAR AI Chair. Correspondence to: Vikram Voleti <vikram.voleti@gmail.com>. Third workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2021). Copyright 2021 by the author(s). Figure 1. The architecture of our MRCNF method (best viewed in color). Continuous normalizing flows (CNFs) gs are used to generate images xs from noise zs at each resolution, with those at finer resolutions conditioned (dashed lines) on the coarser image one level above xs+1, except at the coarsest level. In this work, we focus on making the training of continuous normalizing flows feasible for higher resolution images, and help reduce computation time. We thus introduce a novel multi-resolution technique for continuous normalizing flows, by modelling the conditional distribution of high- level information at each resolution in an autoregressive fashion. We show that this makes the models perform better at higher resolutions. A high-level view of our approach is shown in Figure 1. Our main contributions are: 1. We introduce Multi-Resolution Continuous Normal- izing Flows (MRCF), through which we achieve state- of-the-art Bits-per-dimension (BPD) (negative log like- lihood per pixel) on ImageNet64 using fewer model parameters relative to comparable methods. 2. We propose a multi-resolution transformation that does not add cost in terms of likelihood. 2. Background 2.1. Normalizing Flows Normalizing flows (Tabak & Turner, 2013; Jimenez Rezende & Mohamed, 2015; Dinh et al., 2017; Papamakarios et al., 2019; Kobyzev et al., 2020) are generative models that map a complex data distribution, such as real images, to a known noise distribution. They are trained by maximizing the log likelihood of their input images. Suppose a normalizing flow g produces output z