On the Limiting Distribution of Program Sizes in Tree-based Genetic Programming R. Poli Department of Computer Science University of Essex, UK W. B. Langdon Department of Computer Science University of Essex, UK Stephen Dignum Department of Computer Science University of Essex, UK Department of Computer Science University of Essex Technical Report CSM-464 ISSN: 1744-8050 December 2006 Abstract We provide strong theoretical and experimental evidence that standard sub-tree crossover with uniform selection of crossover points pushes a population of a-ary GP trees towards a distribution of tree sizes of the form: Pr{n} = (1 - apa) an +1 n ! (1 - pa) (a-1)n+1 p n a where n is the number of internal nodes in a tree and pa is a constant. This result generalises the result previously reported in [7, 10, 8, 9] for the case a = 1. 1 Introduction For most problems the ratio between the size of the search spaces and the number of acceptable solutions grows exponentially with the size of the problem. So, even with today’s powerful com- puters, for many problems one can hope to find solutions with a particular search algorithm only if the algorithms is biased in such a way to sample preferentially the areas of the search space where solutions are denser. This situation is often informally referred to as an algorithm being well-matched to a problem. Naturally, having a full characterisation of the search biases of a search algorithm is a precon- dition to understand whether or not the algorithm is well-matched to a problem. (The second precondition is the availability of a characterisation of the problem, e.g., information on the dis- tribution of solutions in the search space.) In evolutionary algorithms this requires understanding the biases of the genetic operators. These biases are fairly well understood for mutation and crossover in the case of fixed-length rep- resentations (e.g., binary GAs) and for selection (which is representation independent). However, the situation is much sketchier for variable-length representations. In particular, except for the limiting case of linear-trees (built only using arity-1 primitives and terminals), we still know very little about the search biases of standard GP crossover. In this paper we provide an exact characterisation of the limiting distribution of tree sizes towards which sub-tree crossover, when acting on its own, pushes the population. As we will see, 1