Exploring the Adenylation Domain Repertoire of Nonribosomal Peptide Synthetases Using an Ensemble of Sequence-Search Methods Guillermin Agu ¨ ero-Chapin 1,2,3 , Reinaldo Molina-Ruiz 2 , Emanuel Maldonado 1 , Gustavo de la Riva 4 , Aminael Sa ´ nchez-Rodrı ´guez 5 , Vitor Vasconcelos 1,3 , Agostinho Antunes 1,3 * 1 CIMAR/CIIMAR, Centro Interdisciplinar de Investigac ¸a ˜o Marinha e Ambiental, Universidade do Porto, Porto, Portugal, 2 Molecular Simulation and Drug Design (CBQ), Universidad Central ¨ Marta Abreu ¨ de Las Villas (UCLV), Santa Clara, Cuba, 3 Departamento de Biologia, Faculdade de Cie ˆncias, Universidade do Porto, Porto, Portugal, 4 Departamento de Biologı ´a, Instituto Tecnolo ´ gico Superior de Irapuato (ITESI), Carretera Irapuato-Silao Km. 12.5, El Copal, Irapuato, Guanajuato, Me ´ xico, 5 CMPG, Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium Abstract The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria Microcystis aeruginosa. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. TI2BioP (Topological Indices to BioPolymers) version 2.0, available at http://ti2biop.sourceforge.net/ allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families. Citation: Agu ¨ ero-Chapin G, Molina-Ruiz R, Maldonado E, de la Riva G, Sa ´nchez-Rodrı ´guez A, et al. (2013) Exploring the Adenylation Domain Repertoire of Nonribosomal Peptide Synthetases Using an Ensemble of Sequence-Search Methods. PLoS ONE 8(7): e65926. doi:10.1371/journal.pone.0065926 Editor: Christos A. Ouzounis, The Centre for Research and Technology, Hellas, Greece Received November 21, 2012; Accepted May 1, 2013; Published July 16, 2013 Copyright: ß 2013 Agu ¨ ero-Chapin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors acknowledge the Portuguese Fundac ¸a ˜o para a Cie ˆncia e a Tecnologia (FCT) for financial support to GACH (SFRH/BD/47256/2008), and the projects PTDC/AAC-AMB/104983/2008 (FCOMP-01-0124-FEDER-008610), PTDC/AAC-CLI/116122/2009 (FCOMP-01-0124-FEDER-014029), and PesT-C/MAR/LA0015/ 2011 to AA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: aantunes@ciimar.up.pt Introduction The Chemical Graph Theory (CGT) consists in the application of the graph theory to perform combinatorial and topological exploration of the chemical molecular structure. Currently, the CGT is being extended to bioinformatics through the introduction of two-dimensional (2D) graphs for comparative analyses of DNA/ RNA and proteins without the use of sequence alignments. These 2D graphs or maps do not represent the ‘‘real structure’’ of the natural biopolymers but they have been very effective to inspect similarities/dissimilarities among them, either by direct visualiza- tion or by numerical characterization [1]. Examples of 2D artificial representations of DNA and protein sequences with potentialities in bioinformatics include the spectrum-like, star-like, cartesian-type and four-color maps [1–5]. These DNA/RNA and protein maps can generally unravel higher-order useful informa- tion contained beyond the primary structure, i.e. nucleotide/ amino acid distribution into a 2D space. Their essence can be captured in a quantitative manner through numerical indices to easily compare a great number of sequences/maps [6–8]. One of the simplest numerical characterizations of sequences compre- hends the use of topological indices. Topological Indices (TIs) are based on the connectivity between the elements composing the 2D graph in terms of whether they are connected or not [9,10]. While several types of 2D maps have been developed for DNA/RNA and proteins, including their numerical characterization [11], the four-color maps application in bioinformatics has been mostly unexplored, being limited to illustrative examples on the comparative characterization of DNA and protein sequences [12]. However, the use of the four-color maps and its numerical characterization can cooperate with traditional homology search tools (e.g. BLAST, HMMs) to carry out an exhaustive exploration of functional signatures in highly diverse gene/protein families. Such exploration is effective when all family members are retrieved including remote homologs. Remotes homologues are divergent gene/protein sequences that have conserved the same biological function in different organisms. They can be harvest in the alignment algorithms twilight zone (,30% of amino acid identity) and have been traditionally detected by the use of more PLOS ONE | www.plosone.org 1 July 2013 | Volume 8 | Issue 7 | e65926