Swaps in Protein Sequences
Amit Fliess, Benny Motro, and Ron Unger
*
Faculty of Life Science,Bar-Ilan University, Ramat-Gan, Israel
ABSTRACT An important question in protein
evolution is to what extent proteins may have under-
gone swaps (switches of domain or fragment order)
during evolution. Such events might have occurred
in several forms: Swaps of short fragments, swaps of
structural and functional motifs, or recombination
of domains in multidomain proteins. This question
is important for the theoretical understanding of
the evolution of proteins, and has practical implica-
tions for using swaps as a design tool in protein
engineering. In order to analyze the question system-
atically, we conducted a large scale survey of pos-
sible swaps and permutations among all pairs of
protein from the Swissport database. A swap is
defined as a specific kind of sequence mutation
between two proteins in which two fragments that
appear in both sequences have different relative
order in the two sequences. For example, aXbYc and
dYeXf are defined as a swap, where X and Y repre-
sent sequence fragments that switched their order.
Identifying such swaps is difficult using standard
sequence comparison packages. One of the main
problems in the analysis stems from the fact that
many sequences contain repeats, which may be
identified as false-positive swaps. We have used two
different approaches to detect pairs of proteins with
swaps. The first approach is based on the predefined
list of domains in Pfam. We identified all the pro-
teins that share at least two domains and analyzed
their relative order, looking for pairs in which the
order of these domains was switched. We designed
an algorithm to distinguish between real swaps and
duplications. In the second approach, we used Blast
to detect pairs of proteins that share several frag-
ments. Then, we used an automatic procedure to
select pairs that are likely to contain swaps. Those
pairs were analyzed visually, using a graphical tool,
to eliminate duplications. Combining these ap-
proaches, about 140 different cases of swaps in the
Swissprot database were found (after eliminating
multiple pairs within the same family). Some of the
cases have been described in the literature, but
many are novel examples. Although each new ex-
ample identified may be interesting to analyze, our
main conclusion is that cases of swaps are rare in
protein evolution. This observation is at odds with
the common view that proteins are very modular to
the point that modules (e.g., domains) can be shuffled
between proteins with minimal constraints. Our
study suggests that sequential constraints, i.e., the
relative order between domains, are highly con-
served. Proteins 2002;48:377–387.
© 2002 Wiley-Liss, Inc.
Key words: swaps; protein domains; circular permu-
tations; sequence comparison
INTRODUCTION
Analysis of the domain structure of proteins has led to
the suggestion that protein domains often function as
independent entities (see, e.g., Khosla and Harbury
1
). This
assumption would lead one to suspect that swaps of
protein sequence may have occurred commonly in evolu-
tion.
Knowing the nature and frequency of swaps is impor-
tant for theoretical understanding of the evolution of
proteins, for appreciating the possibilities and limitations
of protein engineering and also has practical implications
for protein sequence comparisons.
We define a swap as a specific kind of sequence mutation
between a pair of proteins in which a fragment of one
sequence can be found in the other protein out of its
original sequential order.
Formally, we define sequences S
1
and S
2
to contain a
swap if S
1
= aXbYc and S
2
= dY'eX'f where X is similar to
X' and Y is similar to Y' under some sequence similarity
measure. (The “filler” sequences a,b,c,d,e,f, do not all have
to exist in each example). A pair of proteins may contain
more than one swap.
Note that this is an operational definition that covers
several evolutionary scenarios, which are not easy to
retrieve. One class of swaps might be a result of genetic
events in an evolving genome. One possibility is a direct
genetic swap event in which two fragments of DNA
transposed their relative position in a gene and thus
created a “swapped” sequence. The other possibility might
be a combination of duplication and deletion events. For
example if the original gene contained two domains AB,
then a duplication will result in ABAB, and consequent
deletions of the flanking domains will result in a gene of
the form BA.
Another class might be of cases where two proteins were
“fused” from similar modules, but the modules were as-
Grant sponsor: Israel Science Foundation; Grant number: 569/98
(Ru).
*Correspondence to: Ron Unger, Faculty of Life Science, Bar Ilan
University, Ramat-Gan, 52900, Israel. E-mail: ron@biocom1.ls.biu.ac.il
Received 27 July 2001; Accepted 7 March 2002
Published online 00 Month 0000 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.10156
PROTEINS: Structure, Function, and Genetics 48:377–387 (2002)
© 2002 WILEY-LISS, INC.