Probing the protein space for extending the detection of weak homology folds Danilo Gullotto a,b,n , Mario Salvatore Nolassi a , Andrea Bernini b , Ottavia Spiga b , Neri Niccolai b a Advanced Computational Biostructural Research Collaboratory, I-95019 Zafferana Etnea, Italy b Department of Biotechnology, Chemistry and Pharmacy, Universit a di Siena, Via Fiorentina 1, I-53100 Siena, Italy HIGHLIGHTS c We attempted to extend the detection of protein remote homology folds. c Models detected share both high accuracy and low sequence identity. c Our algorithm could be a complementary tool for exploring protein motifs. c Motifs detected by our algorithm often share sequence identity lower than 20%. c Domains with remote homology were detected by the merging of motifs trajectories. article info Article history: Received 16 July 2012 Received in revised form 3 November 2012 Accepted 5 December 2012 Available online 19 December 2012 Keywords: Protein motif Structure prediction Protein fold Remote homology Structural bioinformatics abstract Redundancy of prediction methods has been used to explore the occurrence of weak homology protein motifs. A hybrid template-based algorithm has been implemented to predict different layers of protein structure by detecting domain building sub-structures, which share low sequence identity. Physico- chemical determinants, secondary structure profiles, and multiple alignments have been analyzed to generate a broad set of structural sub-domains. Then, intensive computing procedures generated all the various tridimensional folds on the basis of secondary structure predictions, fragment assembly and detection of structural homologs. The proposed algorithm not only identifies common protein sub-structures, but also detects higher order architectures such as domain superfamilies/superfolds by linking backbone trajectories of supersecondary structures. Applying rigid transformation protocols, population of the detected domain building models with an average root mean square deviation from native structures of 2.3 ˚ A and an average template modeling score from native structures of 0.43 has been obtained. The fold detection algorithm here proposed yields more accurate results than previously proposed methods, predicting structural homology also for proteins sharing less than 20% sequence identity. Our tools are freely available at http://www.acbrc.org/tools.html. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction The way proteins fold and sequence homology are related is still an unsolved problem, even though it is apparent that a specific tridimensional protein structure depends on the informa- tion stored in the corresponding amino acid sequence (Levinthal, 1969; Ben-Naim, 2012). The fact that proteins, under the evolu- tionary pressure, may retain similar folds sharing low sequence identity, suggests that structural divergence, due to accumulation of mutational events, proceeds less rapidly than primary structure (Chothia and Lesk, 1986; Todd et al., 2001; Harrison et al., 2002). It has been observed that proteins are frequently composed by sub-units, whose structure and function can evolve indepen- dently with respect to the whole protein, allowing a hierarchical approach for proteins classification (Murzin et al., 1995; Orengo et al., 1997). Protein conformational space, in spite of the huge number of possible amino acid combinations, contains only a limited number of canonical folds (Jones and Thirup, 1986; Chothia, 1992; Orengo et al., 1994). This finding has been the basis of template-based modeling (TBM) and fold recognition procedures for analyzing homology of resolved protein structures (Chothia and Lesk, 1986). Current TBM methods can be classified into four general categories: (i) sequence–sequence comparison Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/yjtbi Journal of Theoretical Biology 0022-5193/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jtbi.2012.12.005 n Corresponding author at: Universit a di Siena, Department of Biotechnology, Chemistry and Pharmacy, Via Fiorentina 1, 53100 Siena, Siena, Italy, Tel.: þ39 0957081078. E-mail addresses: biores@acbrc.org (D. Gullotto), marionolassi@gmail.com (M.S. Nolassi), andrea.bernini@unisi.it (A. Bernini), ottavia.spiga@unisi.it (O. Spiga), neri.niccolai@unisi.it (N. Niccolai). Journal of Theoretical Biology 320 (2013) 152–158