JOURNAL OF COMPUTATIONAL BIOLOGY Volume 16, Number 1, 2009 © Mary Ann Liebert, Inc. Pp. 85–103 DOI: 10.1089/cmb.2008.0082 Extended HP Model for Protein Structure Prediction TAMJIDUL HOQUE, 1 MADHU CHETTY, 2 and ABDUL SATTAR 1 ABSTRACT This paper describes a detailed investigation of a lattice-based HP (hydrophobic-hydrophilic) model for ab initio protein structure prediction (PSP). The outcome of the simplified HP lattice model has high degeneracy, which could mislead the prediction. The HPNX model was proposed to address the degeneracy problem as well as to avoid the conformational deformity with the hydrophilic (P) residues. We have experimentally shown that it is necessary to further improve the existing HPNX model. We have found and solved the critical error of another existing YhHX model. By extracting the significant features from the YhHX for the HPNX model, we have proposed a novel hHPNX model. Hybrid Genetic Algorithm (HGA) has been used to compare the predictability of these models and hHPNX outperformed other models. We preferred 3D face-centered-cube (FCC) lattice configuration to have closest resemblance to the real folded 3D protein. Key words: protein structure prediction, novel low resolution model, genetic algorithm. 1. INTRODUCTION F OR AN EFFECTIVE AND FASTER EXPLORATION of the protein structure prediction (PSP) landscape, various types of lattice models are used and are found to be useful for investigations. Usually, a particular lattice model is adopted with the intention of restricting the protein structure space (Wroe et al., 2005) to encodable structures that otherwise would not have been encodable (Alm et al., 2002) in the unrestricted continuous and complex structure space. The usefulness of the low-resolution modeling for solving the ab initio PSP problem in practice can be found elsewhere (Baker, 2006; Chivian et al., 2003; Samudrala et al., 1999; Hinds and Levitt, 1994; Koehl and Levitt, 1999; Kolinski et al., 2003; Schueler- Furman et al., 2005; Xia et al., 2000). If high-resolution models are to be used, this can be done for a smaller pool of approximate conformations obtained by selecting the superior solutions of simplified (i.e., low resolution) lattice model from a huge pool of approximate conformations. This two-stage hierarchical paradigm improves the overall computational time required for solving the ab initio problem. For instance, in Samudrala et al. (1999), 10,000 fit samples were taken from a pool of a possible 10 million conformations by using the simple tetrahedral lattice model, and then those 10,000 samples were improved for further investigation, which helps scaling down the number of fitter solutions further in the next step. Among various lattice models based on different numbers of beads, the hydrophobic-hydrophilic (HP) lattice model (being simple) has always played a vital role for research in the PSP problem. The rationale 1 Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Nathan, QLD, Australia. 2 Gippsland School of Information Technology (GSIT), Monash University, Churchill, VIC, Australia. 85