Computational Biology and Chemistry 27 (2003) 575–580 Software note Use of a novel Hill-climbing genetic algorithm in protein folding simulations Lee R. Cooper a , David W. Corne b , M. James C. Crabbe a,* a School of Animal and Microbial Sciences, University of Reading, Whiteknights, Reading RG6 6AJ, UK b Department of Computer Science, University of Reading, Whiteknights, Reading RG6 6AY, UK Received 19 May 2003; received in revised form 25 June 2003; accepted 28 June 2003 Abstract We have developed a novel Hill-climbing genetic algorithm (GA) for simulation of protein folding. The program (written in C) builds a set of Cartesian points to represent an unfolded polypeptide’s backbone. The dihedral angles determining the chain’s configuration are stored in an array of chromosome structures that is copied and then mutated. The fitness of the mutated chain’s configuration is determined by its radius of gyration. A four-helix bundle was used to optimise simulation conditions, and the program was compared with other, larger, genetic algorithms on a variety of structures. The program ran 50% faster than other GA programs. Overall, tests on 100 non-redundant structures gave comparable results to other genetic algorithms, with the Hill-climbing program running from between 20 and 50% faster. Examples including crambin, cytochrome c, cytochrome B and hemerythrin gave good secondary structure fits with overall alpha carbon atom rms deviations of between 5 and 5.6 Å with an optimised hydrophobic term in the fitness function. © 2003 Elsevier Ltd. All rights reserved. Keywords: Fitness function; Crambin; Gamma crystallin; Supercomputer; GA 1. Introduction Modelling of a protein’s folding reaction is a formidable optimisation task equivalent to searching an energy land- scape of limitless dimensionality. Genetic algorithms (GAs) mimic the strategy of natural selection, and are well suited to optimizing solutions over large combinatorial spaces. Se- lection of parents in a GA is by a fitness function, encom- passing and balancing the driving forces of folding. Dandekar and Argos (1992, 1994) investigated the ef- fect of different folding forces and their importance using four-helix bundle proteins. The same authors had similar success (Dandekar and Argos, 1996) with other small pro- teins no matter whether they were largely helical, mixed or beta-strand rich. Fitness criteria, simulating the forces with appropriate parameters and weights determined in the ide- alised cases, were used in a GA to predict the tertiary back- bone fold of proteins with experimentally known structures under 120 residues in length. * Corresponding author. E-mail address: m.j.c.crabbe@rdg.ac.uk (M.J.C. Crabbe). The method employed in our study exploits a novel Hill-climbing algorithm to improve the power of GAs, particularly in relation to protein folding. The main chain of a protein is folded from a knowledge of the primary sequence and predictions of its secondary structure. Reli- able secondary structure predictions are returned as output from the Protein Predict server: http://www.ebi.ac.uk/rost/ predictprotein/submit adv.html that has been trained using over 700 solved protein structures. 2. Materials and methods 2.1. Computing A C program (PSP-sGA) was developed based on a ge- netic algorithm described by Goldberg (1989). Conforma- tions were represented as arrays of paired integers. The dihedral angles determining the chain’s configuration are stored in an array of chromosome structures that is copied and then mutated. The fitness of the mutated chain’s con- figuration is determined by its radius of gyration. 1476-9271/$ – see front matter © 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S1476-9271(03)00047-1