790 Mol. BioSyst., 2012, 8, 790–795 This journal is c The Royal Society of Chemistry 2012 Cite this: Mol. BioSyst., 2012, 8, 790–795 Horizontal gene transfers as metagenomic gene duplicationsw Luigi Grassi,z a Michele Caselle, ab Martin J. Lercher c and Marco Cosentino Lagomarsino* de Received 9th August 2011, Accepted 30th November 2011 DOI: 10.1039/c2mb05330f While it is well accepted that horizontal gene transfer plays an important role in the evolution and the diversification of prokaryotic genomes, many questions remain open regarding its functional mechanisms of action and its interplay with the extant genome. This study addresses the relationship between proteome innovation by horizontal gene transfer and genome content in Proteobacteria. We characterize the transferred genes, focusing on the protein domain compositions and their relationships with the existing protein domain superfamilies in the genome. In agreement with previous observations, we find that the protein domain architectures of horizontally transferred genes are significantly shorter than the genomic average. Furthermore, protein domains that are more common in the total pool of genomes appear to have a proportionally higher chance to be transferred. This suggests that transfer events behave as if they were drawn randomly from a cross-genomic community gene pool, much like gene duplicates are drawn from a genomic gene pool. Finally, horizontally transferred genes carry domains of exogenous families less frequently for larger genomes, although they might do it more than expected by chance. Introduction Prokaryotic genomes are highly diverse in their gene contents. This variability is related to variation in lifestyles and habitats, and is to a large part achieved by widespread horizontal gene transfer (HGT). 1 HGT, the acquisition of genetic material in a non-hereditary manner, is considered to be the main innovative force in bacterial evolution. 2–6 There are several sources of innovation at the transcriptome and regulatory levels caused by HGT. Here, we concentrate on the degree of genome innovation by HGT for the protein functional repertoire. For protein-coding genes, HGTs can be represented by the protein modules (functional or structural domains) they carry. If a given module is not already present in the receiving genome, a new class of physico-chemical functions becomes accessible to the organism thanks to its transfer. Alternatively, a transfer may carry a protein domain belonging to a family already present in the receiving genome; its contribution to genome innovation is then more similar to gene duplication. These two relative contributions affect the number and size (number of members) of homology classes within a given genome. 7–10 We aim to quantify these trends in the framework of the observed ‘‘universal invariants’’ of genome evolution. 11 At the domain level, it was shown that homology classes in bacteria follow collective trends with genome size. 7,12 For example, the number of domain homology classes is very similar for genomes of similar size, and increases sublinearly with genome size. Furthermore, addition of genes by HGT will affect the relative growth of different functional classes of genes. Molina and van Nimwegen 8,13 showed that functional categories grow collectively with genome size, following a power-law with a category-specific exponent. This growth behavior of functional or homology classes can be explained by a model that includes a ‘‘rich-get-richer’’ principle, where the probability of adding a new member to the class is proportional to the class size. Each class then grows at a specific rate, termed ‘‘evolutionary potential’’. 8 For gene duplications, a rich-get-richer principle trivially follows if we assume that all genes of a given class are a priori equally likely to get duplicated. However, prokaryotes tend to add genes by HGT rather than gene duplication. 1,5,14 One possibility is that HGT could act effectively as a duplication move in a larger cross-genomic gene family pool, but in some cases this pool may not be a blown-up mirror image of the genome in question. a Universita ` degli Studi di Torino, Dipartimento di Fisica Teorica, Via P. Giuria 1, 10125 Torino, Italy b I.N.F.N. Torino, Via P. Giuria 1, 10125 Torino, Italy c Institute for Computer Science, Heinrich-Heine-University, 40225 Du ¨sseldorf, Germany d Ge ´nophysique/Genomic Physics Group, UMR 7238 CNRS ‘‘Microorganism Genomics’’, France e University Pierre et Marie Curie, 15 rue de l’E ´ cole de Me ´decine, 75006, France. E-mail: marco.cosentino-lagomarsino@upmc.fr w Electronic supplementary information (ESI) available. See DOI: 10.1039/c2mb05330f z Present address: Physics Department, Sapienza University of Rome, Piazzale Aldo Moro, 5 I-00185 Roma, Italy. Molecular BioSystems Dynamic Article Links www.rsc.org/molecularbiosystems PAPER