Hindawi Publishing Corporation
International Journal of Genomics
Volume 2013, Article ID 670623, 11 pages
http://dx.doi.org/10.1155/2013/670623
Research Article
Global Alignment of Pairwise Protein Interaction Networks for
Maximal Common Conserved Patterns
Wenhong Tian
1
and Nagiza F. Samatova
2,3
1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
Department of Computer and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
3
Computer Science Department, North Carolina State University, Raleigh, NC 27696, USA
Correspondence should be addressed to Wenhong Tian; tian wenhong@uestc.edu.cn
Received 22 December 2012; Revised 5 February 2013; Accepted 23 February 2013
Academic Editor: G. Pesole
Copyright © 2013 W. Tian and N. F. Samatova. Tis is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis.
Most of alignment tools focus on fnding conserved interaction regions across the PPI networks through either local or global
mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In
view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the
size of true orthologs across species is small comparing to the total number of proteins in all species, we take a diferent approach
based on a precompiled list of homologs identifed by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster
(fy), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identifed in the alignment. Te results
are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to
existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specifcity and
sensitivity, and can be extended to multiple alignments easily.
1. Introduction
Protein-protein interactions (PPI) are of central importance
for virtually every process in a living cell. For example, infor-
mation about these interactions improves our understanding
of diseases and can provide the basis for new therapeutic
approaches [1]. One of fundamental goals of system biology
is to understand how proteins in the cell interact with each
other. However, fnding all protein interactions is costly and
labor intensive. For example, to fnd all pairwise interactions
for a species with 5000 proteins, one needs to do 12497500
pairwise tests. Tis is one reason that current known direct
interactions are incomplete. High-throughput experimental
techniques (e.g., yeast two-hybrid and coimmunoprecipita-
tion test) can be helpful in this case. Integrated probability
models are also used to predict the protein-protein inter-
actions [1, 2]. Quite a few databases, DIP [3], IntAct [4],
BioGRID [5], HPRD [6], and IntPro [7], are public available
for collecting and storing PPI network data. Researchers [1, 8–
14] are trying to identify conserved patterns such as ortholog
groups and functional similar pathways/complexes across
species using PPI network data. Figure 1 provides an example
of global visualization of protein interaction networks.
Te exact solution of identifying conserved regions across
species, that is, the network alignment problem, is NP-hard
[1, 8–14]. Tis challenge attracts many researchers to fnd
efcient heuristic solutions for the problem.
A powerful way of representing and analyzing all PPI
network data is to use network models and classical graph-
theoretical approaches [16, 17]. In a PPI network, each protein
is represented as a node and a direct physical interaction
between proteins by an edge. When identifying conserved
patterns across PPI networks, highly similar sequence pro-
teins (homologues) are frstly identifed, then conserved
interactions are clustered, and fnally functional similarities
of each cluster should be validated.