Internal Distribution Matching for Natural Image Retargeting Assaf Shocher * Shai Bagon * Phillip Isola † Michal Irani * * Dept. of Computer Science and Applied Math, The Weizmann Institute of Science † Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Project Website: http://www.wisdom.weizmann.ac.il/ ˜ vision/ingan/ Abstract Good Visual Retargeting changes the global size and as- pect ratio of a natural image, while preserving the size and aspect ratio of all its local elements. We propose formu- lating this principle by requiring that the distribution of patches in the input matches the distribution of patches in the output. We introduce a Deep-Learning approach for re- targeting, based on an Internal GAN (InGAN). InGAN is an image-specific GAN. It incorporates the Internal statistics of a single natural image in a GAN. It is trained on a single input image and learns the distribution of its patches. It is then able to synthesize natural looking target images com- posed from the input image patch-distribution. InGAN is totally unsupervised, and requires no additional data other than the input image itself. Moreover, once trained on the input image, it can generate target images of any specified size or aspect ratio in real-time. 1 1. Introduction The ubiquity of digital displays of various sizes and as- pect ratios poses a great challenge for digital media: any image should be readily retargeted to fit any size and aspect ratio. Good visual retargeting changes the global size and aspect ratio of a natural image, while preserving the size and aspect ratio of all its local elements, see Fig. 1 for ex- amples. Simakov et al. formalized this idea via the notion of bidirectional similarity [19]. The output of a retargeting al- gorithm should exhibit coherence – the output image should contain only patches that are found in the input image – and, vice versa, completeness – the input image should contain only patches that are found in the output. Thus, no artifacts are introduced to the retargeted image and no critical visual information is lost in the process. One way to satisfy these criteria is to require that the distribution of patches in the input should match the distri- bution of patches in the output. We therefore propose dis- tribution matching as a new objective for visual retargeting, 1 Code will be made publicly available. which goes beyond bidirectional similarity to require not only that all input patches can be found in the output, and vice versa, but that the frequency of finding these patches should also match. Reframing the problem as distribution matching has the advantage that we can immediately apply recent advances in adversarial learning. In particular, generative adversarial networks (GANs) can be understood as a tool for distribu- tion matching [7]. A GAN maps data sampled from one dis- tribution to transformed data that is indistinguishable from a target distribution, G : x → y with x ∼ p x , and G(x) ∼ p y . An image can be viewed as a set of samples from a dis- tribution over patches, just like an image dataset can be viewed as a set of samples from a distribution over im- ages. In the same way as we can learn a generative model of images in a dataset, we can learn a generative model of patches in a single image. Retargeting, framed as dis- tribution matching, can therefore be achieved by training a GAN to map from an input image to an output image whose patch distribution is indistinguishable from the in- put’s. Unlike most GANs, which map between two differ- ent distributions, ours is an automorphism at the patch-level, G : x → x, with p x being the distribution of patches in the input image. We call such a GAN an “internal GAN” (In- GAN), because it is trained to match the internal statistics of a single image [23]. Retargeting is achieved by modi- fying the size and aspect ratio of the output tensor, which changes the arrangement of patches, but not the distribution of patches. Although this formulation is sufficient in theory to en- courage both coherence and completeness, in practice we observe that completeness is often not achieved – many patches from the input image are omitted in the output. To ameliorate this, we introduce a second mechanism for encouraging completeness: it should be possible to re- construct (“decode”) the input image from the output, i.e. ‖F (G(x)) − x‖ should be small, where F is a second net- work trained to perform the reverse mapping. This objec- tive encourages the mapping between input and retargeted output to be cycle-consistent [22], a desideratum that has 1 arXiv:1812.00231v1 [cs.CV] 1 Dec 2018