Cartesian Genetic Programming: Why No Bloat? Andrew James Turner and Julian Francis Miller Electronics Department University of York Heslington, York YO10 5DD, UK {andrew.turner,julian.miller}@york.ac.uk Abstract. For many years now it has been known that Cartesian Ge- netic Programming (CGP) does not exhibit program bloat. Two possible explanations have been proposed in the literature: neutral genetic drift and length bias. This paper empirically disproves both of these and thus, reopens the question as to why CGP does not suffer from bloat. It has also been shown for CGP that using a very large number of nodes consid- erably increases the effectiveness of the search. This paper also proposes a new explanation as to why this may be the case. 1 Introduction Bloat, the uncontrolled growth in program size, is a serious issue for Genetic Programming (GP) that has received much study [1] [2]. However, bloat does not appear in Cartesian Genetic Programming (CGP) [3]. In the literature there are two possible theories as to why CGP does not exhibit bloat; Neutral Genetic Drift (NGD) [3] and length bias [4]. This paper introduces both of these theories and then proceeds to empirically disprove them by removing the underlying assumptions each of them make. This leaves us with no explanation for the lack of bloat in CGP and opens the topic for further investigation. The investigations also show that there is an evolutionary pressure to increase the program size when the current program size is insufficient 1 to solve a given task. Conversely we find empirically that there is no evolutionary pressure to decrease the program size if the current program size is much larger than re- quired to solve a given task. It therefore appears that using large program sizes is not detrimental to CGP, in keeping with previous results [5] which show it is actually beneficial. A new hypothesis is presented as to why this is the case. When subject to a mutation operator, using a large number of nodes causes, on average, the fitness of an individual to vary by a lesser degree than when using a smaller number of nodes. Using a large number of nodes has smoothed out the fitness landscape making it easier to navigate. This accords with the desirabil- ity of synonymous redundancy in representations introduced by Goldberg and 1 This is compatible with the length bias theory [4] as is discussed later. M. Nicolau et al. (Eds.): EuroGP 2014, LNCS 8599, pp. 222–233, 2014. c Springer-Verlag Berlin Heidelberg 2014