Swaps in Protein Sequences Amit Fliess, Benny Motro, and Ron Unger * Faculty of Life Science,Bar-Ilan University, Ramat-Gan, Israel ABSTRACT An important question in protein evolution is to what extent proteins may have under- gone swaps (switches of domain or fragment order) during evolution. Such events might have occurred in several forms: Swaps of short fragments, swaps of structural and functional motifs, or recombination of domains in multidomain proteins. This question is important for the theoretical understanding of the evolution of proteins, and has practical implica- tions for using swaps as a design tool in protein engineering. In order to analyze the question system- atically, we conducted a large scale survey of pos- sible swaps and permutations among all pairs of protein from the Swissport database. A swap is defined as a specific kind of sequence mutation between two proteins in which two fragments that appear in both sequences have different relative order in the two sequences. For example, aXbYc and dYeXf are defined as a swap, where X and Y repre- sent sequence fragments that switched their order. Identifying such swaps is difficult using standard sequence comparison packages. One of the main problems in the analysis stems from the fact that many sequences contain repeats, which may be identified as false-positive swaps. We have used two different approaches to detect pairs of proteins with swaps. The first approach is based on the predefined list of domains in Pfam. We identified all the pro- teins that share at least two domains and analyzed their relative order, looking for pairs in which the order of these domains was switched. We designed an algorithm to distinguish between real swaps and duplications. In the second approach, we used Blast to detect pairs of proteins that share several frag- ments. Then, we used an automatic procedure to select pairs that are likely to contain swaps. Those pairs were analyzed visually, using a graphical tool, to eliminate duplications. Combining these ap- proaches, about 140 different cases of swaps in the Swissprot database were found (after eliminating multiple pairs within the same family). Some of the cases have been described in the literature, but many are novel examples. Although each new ex- ample identified may be interesting to analyze, our main conclusion is that cases of swaps are rare in protein evolution. This observation is at odds with the common view that proteins are very modular to the point that modules (e.g., domains) can be shuffled between proteins with minimal constraints. Our study suggests that sequential constraints, i.e., the relative order between domains, are highly con- served. Proteins 2002;48:377–387. © 2002 Wiley-Liss, Inc. Key words: swaps; protein domains; circular permu- tations; sequence comparison INTRODUCTION Analysis of the domain structure of proteins has led to the suggestion that protein domains often function as independent entities (see, e.g., Khosla and Harbury 1 ). This assumption would lead one to suspect that swaps of protein sequence may have occurred commonly in evolu- tion. Knowing the nature and frequency of swaps is impor- tant for theoretical understanding of the evolution of proteins, for appreciating the possibilities and limitations of protein engineering and also has practical implications for protein sequence comparisons. We define a swap as a specific kind of sequence mutation between a pair of proteins in which a fragment of one sequence can be found in the other protein out of its original sequential order. Formally, we define sequences S 1 and S 2 to contain a swap if S 1 = aXbYc and S 2 = dY'eX'f where X is similar to X' and Y is similar to Y' under some sequence similarity measure. (The “filler” sequences a,b,c,d,e,f, do not all have to exist in each example). A pair of proteins may contain more than one swap. Note that this is an operational definition that covers several evolutionary scenarios, which are not easy to retrieve. One class of swaps might be a result of genetic events in an evolving genome. One possibility is a direct genetic swap event in which two fragments of DNA transposed their relative position in a gene and thus created a “swapped” sequence. The other possibility might be a combination of duplication and deletion events. For example if the original gene contained two domains AB, then a duplication will result in ABAB, and consequent deletions of the flanking domains will result in a gene of the form BA. Another class might be of cases where two proteins were “fused” from similar modules, but the modules were as- Grant sponsor: Israel Science Foundation; Grant number: 569/98 (Ru). *Correspondence to: Ron Unger, Faculty of Life Science, Bar Ilan University, Ramat-Gan, 52900, Israel. E-mail: ron@biocom1.ls.biu.ac.il Received 27 July 2001; Accepted 7 March 2002 Published online 00 Month 0000 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.10156 PROTEINS: Structure, Function, and Genetics 48:377–387 (2002) © 2002 WILEY-LISS, INC.