A Novel Framework for Selection of Generative Adversarial Networks for an Application Tanya Motwani * , Manojkumar Parmar *† * Robert Bosch Engineering and Business Solutions Private Limited, Bengaluru, India HEC Paris, Jouy-en-Josas Cedex, France Abstract—Generative Adversarial Network (GAN) is a current focal point of research. The body of knowledge is fragmented, leading to a trial-error method while selecting an appropriate GAN for a given scenario. We provide a comprehensive summary of the evolution of GANs starting from its inception addressing issues like mode collapse, vanishing gradient, unstable training and non-convergence. We also provide a comparison of various GANs from the application point of view, their behavior and implementation details. We propose a novel framework to identify candidate GANs for a specific use case based on architecture, loss, regularization and divergence. We also discuss application of the framework using an example, and we demonstrate a significant reduction in search space. This efficient way to determine potential GANs lowers unit economics of AI development for organizations. I. I NTRODUCTION Generative Adversarial Networks (GANs) are a category of generative models built upon game theory; a two-player minimax game [1]. A typical architecture of such a model consists of two neural networks – a discriminator and gen- erator. The generator transforms the input noise vector into a potentially high dimensional data vector. The discriminator evaluates whether this vector is derived from the original distribution. Based on the outcome, the generator learns to produce samples that are similar to the original distribution. This adversarial technique holds that improvements in one component come at the expense of the other. GANs are one of the dominant methods for generation of real- istic and diverse examples in the domains of computer vision [2] [3] [4] [5], time-series synthesis [6] [7] [8] [9], natural language processing [10] [11] [12] [13], etc. They belong to the class of implicit models which follow a likelihood-free inference approach [14]. Implicit probabilistic models enjoy additional modelling flexibility as compared to classical prob- abilistic models [15]. These models generate images sampled from the learned distribution and do not provide any latent representation of the data samples. GANs offer advantages such as parallel generation, universal approximation, better quality, sharp density estimations and understanding of the structural hierarchy of samples, over other explicit generative models. These properties have aided in immense popularity of GANs in the deep learning community, especially in the field of computer vision. Despite their successes, GANs remain difficult to train as the nature of their optimization results in a dynamic system; each time any parameter of a component, either the discriminator or the generator, is modified, it results in the instability of the system. Current research is dedicated towards search for stable combinations of architectures, losses and hyperparameters for various applications such as image and video generation [16] [17] [18], domain adaptation [3] [19] [20] [21], speech synthesis [22] [23] [24], semantic photo editing [2] [25] [26], etc. While these models attain interest- ing results for particular applications, there is no thorough consensus or reference study available to understand which GAN performs better than others for a specific use case. In this paper, we aim to address the above supposition and narrow down the combinations of attributes for GANs through a technical framework. A. Article Structure The organization of the paper is as follows: Section II highlights the concerns that have transpired while training GANs, followed by Section III that gives an outline of popular loss-variants of GANs. Section IV presents a contrast between these GANs based on application, behavior and implementa- tion. Section V defines the framework with the set of most commonly used architectures, loss functions, regularizations and divergence schemes. Section VI explicates the use of the framework through an example. The future research scope is underlined in Section VII, followed by Section VIII as summary. II. TRAINING I SSUES WITH CLASSIC GANS Despite their progress and success, GANs are subjected to a variety of difficulties during training. These mainly include mode collapse [27], optimization instability [28], vanishing gradient and non-convergence [29]. Furthermore, the methods that attempt to solve these issues depend on heuristics that are susceptible to little modifications. This premise makes it difficult to experiment with new models or utilize the existing ones for different applications. A solid understanding with an emphasis on both their theoretical and practical perspectives is needed to curate research directions towards addressing them. A. Mode Collapse A probability distribution may be multimodal and consist of multiple peaks for various sub-graphs of sample data. Mode collapse, a limiting case of GANs to model multimodal distribution, occurs when the generator places its probability density in a small area of data space. The generator focuses on the creation of new data, while the discriminator’s objective is arXiv:2002.08641v2 [cs.LG] 17 May 2021