Hindawi Publishing Corporation International Journal of Genomics Volume 2013, Article ID 670623, 11 pages http://dx.doi.org/10.1155/2013/670623 Research Article Global Alignment of Pairwise Protein Interaction Networks for Maximal Common Conserved Patterns Wenhong Tian 1 and Nagiza F. Samatova 2,3 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 2 Department of Computer and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA 3 Computer Science Department, North Carolina State University, Raleigh, NC 27696, USA Correspondence should be addressed to Wenhong Tian; tian wenhong@uestc.edu.cn Received 22 December 2012; Revised 5 February 2013; Accepted 23 February 2013 Academic Editor: G. Pesole Copyright © 2013 W. Tian and N. F. Samatova. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on fnding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a diferent approach based on a precompiled list of homologs identifed by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fy), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identifed in the alignment. Te results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specifcity and sensitivity, and can be extended to multiple alignments easily. 1. Introduction Protein-protein interactions (PPI) are of central importance for virtually every process in a living cell. For example, infor- mation about these interactions improves our understanding of diseases and can provide the basis for new therapeutic approaches [1]. One of fundamental goals of system biology is to understand how proteins in the cell interact with each other. However, fnding all protein interactions is costly and labor intensive. For example, to fnd all pairwise interactions for a species with 5000 proteins, one needs to do 12497500 pairwise tests. Tis is one reason that current known direct interactions are incomplete. High-throughput experimental techniques (e.g., yeast two-hybrid and coimmunoprecipita- tion test) can be helpful in this case. Integrated probability models are also used to predict the protein-protein inter- actions [1, 2]. Quite a few databases, DIP [3], IntAct [4], BioGRID [5], HPRD [6], and IntPro [7], are public available for collecting and storing PPI network data. Researchers [1, 8 14] are trying to identify conserved patterns such as ortholog groups and functional similar pathways/complexes across species using PPI network data. Figure 1 provides an example of global visualization of protein interaction networks. Te exact solution of identifying conserved regions across species, that is, the network alignment problem, is NP-hard [1, 814]. Tis challenge attracts many researchers to fnd efcient heuristic solutions for the problem. A powerful way of representing and analyzing all PPI network data is to use network models and classical graph- theoretical approaches [16, 17]. In a PPI network, each protein is represented as a node and a direct physical interaction between proteins by an edge. When identifying conserved patterns across PPI networks, highly similar sequence pro- teins (homologues) are frstly identifed, then conserved interactions are clustered, and fnally functional similarities of each cluster should be validated.