M. Keijzer et al. (Eds.): EuroGP 2005, LNCS 3447, pp X-XY, 2005, pp. 73 83, 2005. © Springer-Verlag Berlin Heidelberg 2005 Evolving L-Systems to Capture Protein Structure Native Conformations Gabi Escuela 1 , Gabriela Ochoa 2 and Natalio Krasnogor 3 1,2 Department of Computer Science, Simon Bolivar University, Caracas, Venezuela gabiescuela@netuno.net.ve, gabro@ldc.usb.ve 3 School of Computer Science and I.T., University of Nottingham Natalio.Krasnogor@nottingham.ac.uk Abstract. A protein is a linear chain of amino acids that folds into a unique func- tional structure, called its native state. In this state, proteins show repeated sub- structures like alpha helices and beta sheets. This suggests that native structures may be captured by the formalism known as Lindenmayer systems (L-systems). In this paper an evolutionary approach is used as the inference procedure for folded structures on simple lattice models. The algorithm searches the space of L- systems which are then executed to obtain the phenotype, thus our approach is close to Grammatical Evolution. The problem is to find a set of rewriting rules that represents a target native structure on the lattice model. The proposed ap- proach has produced promising results for short sequences. Thus the foundations are set for a novel encoding based on L-systems for evolutionary approaches to both the Protein Structure Prediction and Inverse Folding Problems. 1 Introduction The Protein Structure Prediction Problem (PSP) is among the most outstanding open problems in Biochemistry. A successful approach for efficient and accurate prediction would hasten a new era for biotechnology. A protein is as a linear sequence of units, called amino acids, that under certain physical conditions folds into a unique func- tional structure known as the native state or tertiary structure. This native state is the key to understanding a proteins’ functionality in a living organism as an enzyme, a storage, transport, messenger, antibody, or regulation molecule. The simplest models for studying the properties of protein folding and structure prediction are based on lat- tices (of 2 or 3 dimensions), these models capture the essential aspects of the folding process while keeping low computational costs. The on-lattice hydrophobic- hydrophilic (HP) model, assumes the hydrophobic effect of amino acids as the main force governing folding. The correspondence between amino acids and positions within a lattice is called embedding of the protein. It was shown that finding the embedding of a protein is NP-hard even for very simple lattice models [7,33]. Thus, the use of heuristics and approximation algorithms became the most promising approach for the PSP. In particular, several evolutionary algorithms have been suggested for solving this problem [12,18,19,27,34]. All these approaches employ a direct encoding of the folded chain (See Section 2).