Ant Colony Optimization for Construction of Common Pattern of the Protein Motifs J. Altamiranda 1 , J. Aguilar 1,3 , and C. Delamarche 2 1 Computer Department, University of Los Andes, Mérida, Venezuela, {altamira, aguilar}@ula.ve 2 Structure et Dynamique des macromolecules, University of Rennes I, Rennes, France, christian.delamarche@univ-rennes1.fr 3 Prometeo Researcher, Universidad Técnica Particular de Loja, Ecuador Abstract - In this work is presented an approach for the construction of common patterns of the protein motifs of the amyloid protein motifs, extracted from the database AMYPdb, denoted as regular expressions using the rules PROSITE. Our task is to analyze a set of possible motifs and to detect if similarity exists between them, in order to construct a general motif. The Ant Colony Optimization Model uses an algorithm of combinatorial optimization based on Ant Colonies. It uses the amino acids of the first motif to construct the graph where the ants will walk. Then, the graph is crossed by the ants according to the path of the second motif, used by a transition function that promove to flow the path between similars amino acids. The ants when walking leave pheromone in the nodes, in a way that at the end several have a lot of or little pheromone. Finally the graph is crossed again to construct the resultant regular expression composed by the nodes with much pheromone. Keywords: Bioinformatics, Ant Colony Optimization, Proteins, Biology Computing, Biological Process 1 Introduction This paper defines and develops a computational model for the construction common patterns of protein motifs. It proposes an algorithm based on ACO [1], with some modifications. This algorithm can efficiently find the union between two motifs and allows the generation of a new motif. The two important Bioinformatics tools BLAST [2], FASTA [3] have been developed as a response to the needs of new knowledge about the sequences and protein motifs, using the information stored in these databases. For perform multiple alignment of protein sequences CLUSTAL which is software that provides comprehensive multiple alignment using progressive strategies for aligning DNA and protein sequences of multiple species and helps to find common conserved domains [4]. But there are still problems to solve at the level of information discovery, data classification, among others. Currently, there are several methods of patterns discovery (using Regular Expressions [5], [6], Hidden Markov Model (HMM) [7], Automata, and PSSM Matrix). The regular expressions are the most commonly used by biologists, as well as the graphical method of LOGOS, since visually are simpler to understand and interpret for them [8], [9]. To discover of DNA motif historically has been used the Pratt method [10], which is based on the algorithm Knuth- Morris-Pratt [11], but there are other tools, between the most well-known it have [12], [13], [14], [15], [16], [17]: TEIRESIAS, MEME. The discovery of common motifs between sequences that are distant in evolutionary level (non-homologous or non- related sequences) is a very complex problem. In addition, there are tools that allow comparing DNA motifs and SLM (Short Linear Motifs) defined as regular expressions, such as CompariMotif [18], FunClust [19], and Bio.motif [20]. However, these tools do not allow fusing them into a common expression. It’s possible to discover relationships between proteins construction a common pattern between multiple motifs. The relationships can be illustrated by looking at the following example: If three motifs If S1 (C-x-H-x-[LIVMFY]-C-x (2)-C- [LIVMYA]), S2 (H-A-M-C-x-(2)-C) and S3 (H-x-L-C-{R}- C). It is observed that S2 and S3 are sub-motifs of S1. May be written a common motif would be (H-x-[ML]-C-x (2)). Assuming that S1, S2, S3 are specific motif of 3 different families and that common motif does not match any sequence families 1, 2 and 3, represents a consensus motif. As this example, other biological analyzes for groups of motives could do. For this will be need to define a method generating a common pattern for them, and then test the quality. Specifically, our task is to analyze a set of Protein motifs stored in a database, detect if there are similarities between them, and construct general patterns. The patterns found can be explained by the existence of segments that have been preserved during the natural evolution of proteins, and suggest that the obtained regions play a functional role in their mechanisms and structure. Most of the algorithms of motifs search use heuristic techniques to obtain near optimal solutions with a relatively low computational cost [9]. For Int'l Conf. Bioinformatics and Computational Biology | BIOCOMP'15 | 43