M. Keijzer et al. (Eds.): EuroGP 2005, LNCS 3447, pp X-XY, 2005, pp. 73 – 83, 2005.
© Springer-Verlag Berlin Heidelberg 2005
Evolving L-Systems to Capture Protein Structure
Native Conformations
Gabi Escuela
1
, Gabriela Ochoa
2
and Natalio Krasnogor
3
1,2
Department of Computer Science, Simon Bolivar University, Caracas, Venezuela
gabiescuela@netuno.net.ve, gabro@ldc.usb.ve
3
School of Computer Science and I.T., University of Nottingham
Natalio.Krasnogor@nottingham.ac.uk
Abstract. A protein is a linear chain of amino acids that folds into a unique func-
tional structure, called its native state. In this state, proteins show repeated sub-
structures like alpha helices and beta sheets. This suggests that native structures
may be captured by the formalism known as Lindenmayer systems (L-systems).
In this paper an evolutionary approach is used as the inference procedure for
folded structures on simple lattice models. The algorithm searches the space of L-
systems which are then executed to obtain the phenotype, thus our approach is
close to Grammatical Evolution. The problem is to find a set of rewriting rules
that represents a target native structure on the lattice model. The proposed ap-
proach has produced promising results for short sequences. Thus the foundations
are set for a novel encoding based on L-systems for evolutionary approaches to
both the Protein Structure Prediction and Inverse Folding Problems.
1 Introduction
The Protein Structure Prediction Problem (PSP) is among the most outstanding open
problems in Biochemistry. A successful approach for efficient and accurate prediction
would hasten a new era for biotechnology. A protein is as a linear sequence of units,
called amino acids, that under certain physical conditions folds into a unique func-
tional structure known as the native state or tertiary structure. This native state is the
key to understanding a proteins’ functionality in a living organism as an enzyme, a
storage, transport, messenger, antibody, or regulation molecule. The simplest models
for studying the properties of protein folding and structure prediction are based on lat-
tices (of 2 or 3 dimensions), these models capture the essential aspects of the folding
process while keeping low computational costs. The on-lattice hydrophobic-
hydrophilic (HP) model, assumes the hydrophobic effect of amino acids as the main
force governing folding.
The correspondence between amino acids and positions within a lattice is called
embedding of the protein. It was shown that finding the embedding of a protein is
NP-hard even for very simple lattice models [7,33]. Thus, the use of heuristics and
approximation algorithms became the most promising approach for the PSP. In
particular, several evolutionary algorithms have been suggested for solving this
problem [12,18,19,27,34]. All these approaches employ a direct encoding of the
folded chain (See Section 2).