GenCo: A project report Penousal Machado ISEC – Instituto Superior de Engenharia de Coimbra CISUC – Centro de Informática e Sistemas da Universidade de Coimbra Dep. de Engenharia Informática 3030 Coimbra, Portugal machado@dei.uc.pt André Dias CISUC - Centro de Informática e Sistemas da Universidade de Coimbra adias@student.dei.uc.pt Amílcar Cardoso CISUC - Centro de Informática e Sistemas da Universidade de Coimbra DEI - Polo II da Universidade de Coimbra, Dep. de Engenharia Informática 3030 Coimbra, Portugal amilcar@dei.uc.pt Abstract Genetic Programming involves the evolution of computer programs, which are usually represented by trees composed by functions and terminals. In order to assign fitness, one must evaluate the programs, which is the most time demanding step of GP. In nowadays standard approaches, the evaluation involves an interpretation step. To avoid this step, which significantly slows the algorithm, some researchers evolve, directly, machine code programs. An alternative approach is to build a Genome Compiler, i.e. a system that transforms the individual’s trees in machine-code programs and executes this code. Both techniques can bring huge speed improvements. However, these approaches have some shortcomings. In this paper we present GenCo: a research project whose main goal is development of a Genetic Programming Genome Compiler system, that overcomes some of the drawbacks of current approaches, enabling high speed improvements in a wider range of domains. We will also present experimental results in a programmatic compression task, in which GenCo was, on average, 80 times faster than a standard C based GP system. 1. Introduction GP is one of the most recent Evolutionary Computation techniques. Its goal is to evolve populations of computer programs, which improve automatically as evolution progresses [Banzhaf 98]. Due to the outstanding influence of Koza’s seminal book, “Genetic Programming: On the Programming of Computers by Means of Natural Selection” [Koza 92], it is common, within the Machine Learning community, to associate the term GP to the evolution of tree structures (even when the trees are not interpreted as computer programs). In this paper we are going to follow this “classical” definition. Therefore, when we talk about GP we are talking about the evolution of tree structures, which are built from a set of functions (f-set) and terminals (t-set). The internal nodes of the tree are members of the f-set, and the leafs are members of the t-set. The interest in GP is growing rapidly, which can be easily explained, if we take into account that automatic programming is expected to be one of the most important tasks in computer science research over the next twenty years [Banzhaf 98]. The increase of speed in computer hardware and capability increased exponentially. However, software development was unable to keep up with this growth, and the gap is still increasing. Additionally, the demand for new software is also growing exponentially, but there isn’t enough humanpower to respond to this demand. The process of writing code is simply to slow. GP has already achieved human-competitive results in a wide variety of fields. However, the applicability of GP to complex problems and real life situations is, still, undermined by the computational complexity of the GP process. To overcome this problem, researchers have, frequently, resorted to the use of massively parallel computers, the problem is that most researchers cannot afford one. According to [Nordin 94] about 99% of the time is spend on the individuals' evaluation. In problems such as symbolic regression, in which the individuals must be evaluated for a set of fitness cases, this number becomes even higher. Thus, if we can achieve significant speed improvements in the evaluation step we will also get significant overall speed improvements. The first GP systems where implemented in LISP, over the time researchers gradually moved to compiled languages. Nowadays, C and C++ based systems are the most popular ones, and are, roughly, 25 times faster than LISP base ones. In 1994 Nordin proposed the direct evolution of machine code programs. His system achieved unprecedented speeds, being, approximately, 60 times faster than standard C based GP. Fukunaga [98] proposes an alternative way of getting significant speed improvements, namely: compile the individuals online and execute the resulting code. This kind of system was named a Genome Compiler, and is about 50 times faster than standard C based GP [Fukunaga 98]. The main advantage of this kind of system over Nordin's