Seeding Diversity into AI Art Paper type: Technical Paper Marvin Zammit, Antonios Liapis and Georgios N. Yannakakis Institute of Digital Games, University of Malta, MSD2080, Malta {marvin.zammit,antonios.liapis,georgios.yannakakis}@um.edu.mt Abstract This paper argues that generative art driven by confor- mance to a visual and/or semantic corpus lacks the nec- essary criteria to be considered creative. Among sev- eral issues identified in the literature, we focus on the fact that generative adversarial networks (GANs) that create a single image, in a vacuum, lack a concept of novelty regarding how their product differs from pre- viously created ones. We envision that an algorithm that combines the novelty preservation mechanisms in evolutionary algorithms with the power of GANs can deliberately guide its creative process towards output that is both good and novel. In this paper, we use re- cent advances in image generation based on semantic prompts using OpenAI’s CLIP model, interrupting the GAN’s iterative process with short cycles of evolution- ary divergent search. The results of evolution are then used to continue the GAN’s iterative process; we hy- pothesise that this intervention will lead to more novel outputs. Testing our hypothesis using novelty search with local competition, a quality-diversity evolutionary algorithm that can increase visual diversity while main- taining quality in the form of adherence to the semantic prompt, we explore how different notions of visual di- versity can affect both the process and the product of the algorithm. Results show that even a simplistic measure of visual diversity can help counter a drift towards sim- ilar images caused by the GAN. This first experiment opens a new direction for introducing higher intention- ality and a more nuanced drive for GANs. Introduction Visual art is among the most well-researched domains in computational creativity as it is perhaps the most recognis- able among tasks which, when performed by humans, are deemed creative (Ritchie 2007). Painting in any style or medium requires some degree of skill (Colton 2008), and en- dowing machines with painting skill has a long and exciting history (Cohen 2017; Colton 2012; Lindemeier et al. 2015; Machado and Cardoso 2002). A watershed moment in this endeavour has been the advent of Generative Adversarial Networks (GANs) (Goodfellow et al. 2014), which not only started to bridge the gap between human and machine per- formance but also allowed novices to generate compelling images without extensive technical knowledge, development effort, or access to specialised hardware. Generative art pro- duced through deep learned models has taken the world by storm in the last five years. The strength of models trained in vast image databases in producing highly typical content, such as human faces, has led to an almost ubiquitous fascina- tion by researchers, artists, laymen, media, and speculators. We follow McCormack, Gifford, and Hutchings (2019) and refer to visuals generated via deep learning as “AI Art” in this paper. As the general public became more interested in AI Art, a crucial component for the perception of creativity hinged on whether the software could explain in natural language the framing information regarding what it was trying to portray (Colton, Charnley, and Pease 2011). While several GAN architectures addressed the generation of images from text prompts (Reed et al. 2016; Zhang et al. 2017), they per- formed well only in limited datasets and could not scale to generate visuals based on broader themes. The recent intro- duction of OpenAI’s Dall-E (Ramesh et al. 2021) demon- strated an unprecedented high correspondence between a given text prompt and the generated image on different prompts. While neither the Dall-E model nor the training dataset have been publicly released at the time of writing, a pre-trained model of Contrastive Language-Image Pretrain- ing (CLIP) is available (Radford et al. 2021). The release of CLIP energised researchers and enthusiasts alike, leading to many open-source projects and twitter bots that take advan- tage of the links between semantics and images to produce more convincing AI Art, such as album titles and covers 1 . In the context of computational creativity, however, it would be easy to argue that images generated only to con- form to the patterns of the corpus fall into “mere generation” (Ventura 2016) and lack authenticity (McCormack, Gifford, and Hutchings 2019). Using the criteria of novelty, qual- ity and typicality regarding products of a creative process (Ritchie 2007), we argue that GANs and similar architec- tures target only typicality by conforming to patterns dis- covered in their training corpus. While we appreciate that there are several issues—such as intent and attribution (Mc- Cormack, Gifford, and Hutchings 2019)—that AI Art should address before it can be considered creative, we focus in this paper on the novelty of the product by endowing the algo- 1 https://twitter.com/ai_metal_bot arXiv:2205.00804v1 [cs.AI] 2 May 2022