Using Subtree Crossover Distance to Investigate Genetic Programming Dynamics Leonardo Vanneschi 1 , Steven Gustafson 2 , and Giancarlo Mauri 1 1 Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) University of Milano-Bicocca, 20126 Milan, Italy 2 School of Computer Science & IT, University of Nottingham Jubilee Campus, Wollaton Rd. Nottingham, NG81BB, United Kingdom Abstract. To analyse various properties of the search process of genetic pro- gramming it is useful to quantify the distance between two individuals. Using operator-based distance measures can make this analysis more accurate and re- liable than using distance measures which have no relationship with the genetic operators. This paper extends a recent definition of a distance measure based on subtree crossover for genetic programming. Empirical studies are presented that show the suitability of this measure to dynamically calculate the fitness distance correlation coefficient during the evolution, to construct a fitness sharing system for genetic programming and to measure genotypic diversity in the population. These experiments confirm the accuracy of the new measure and its consistency with the subtree crossover genetic operator. 1 Introduction Tree-based genetic programming (GP) uses transformation operators on tree structures [1] to carry out search. These operators define a neighbourhood structure over the trees. To analyse various dynamics of the GP search process, it is useful to quantify the dis- tance between two trees in this topological space. For example, the distance between trees is useful if we want to monitor population diversity (see for instance [2–7]) or if we want to calculate well-known measures of problem hardness such as fitness dis- tance correlation (FDC) (see among others [8–11]). Operator-based distance measures can make calculating distance and the analysis of the search process more accurate [10, 11, 2–5]. The difficulty in defining operator-based distance measures was highlighted in [12]. Defining a distance measure, or a measure of similarity, that is, in some sense “bound” to (or “consistent” with) the genetic operators informally means that if two trees are close to each other, or similar, one can be transformed into the other in a few applications of the operator(s). Mutation-based distance measures for GP have been de- fined, the most common being some variations on the Levenshtein edit distance [3] and the structural distance [7]. In [12], Gustafson and Vanneschi first defined the notion of a subtree crossover based pseudo-distance measure. In this paper, we extend and gen- eralise that definition and we experimentally show the usefulness of this new distance measure to analyse some properties of the search process.