The evolutionary dynamics of endogenous retroviruses Aris Katzourakis, Andrew Rambaut and Oliver G. Pybus Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK Endogenous retroviruses (ERVs) are vertically trans- mitted intragenomic elements derived from integrated retroviruses. ERVs can proliferate within the genome of their host until they either acquire inactivating mutations or are lost by recombinational deletion. We present a model that unifies current knowledge of ERV biology into a single evolutionary framework. The model predicts the possible long-term outcomes of retroviral germline infection and can account for the variable patterns of observed ERV genetic diversity. We hope the model will provide a useful framework for understanding ERV evolution, enabling the testing of evolutionary hypotheses and the estimation of para- meters governing ERV proliferation. Endogenous retroviruses in vertebrate genomes Retroviruses convert their RNA genome into DNA and integrate into the genome of their host in the form of a provirus. Occasionally, proviral integrations occur in germline cells and are therefore transmitted vertically [1]. Most vertebrates contain traces of past retroviral germline integrations, collectively called endogenous retroviruses (ERVs) [2]. After the initial integration, ERVs can copy themselves to different locations within the genome, which gives rise, over long periods of time, to a family of related ERV elements [3]. ERVs are classified amongst retroelements – mobile genetic elements that proliferate within the genomes of their hosts and that use an RNA intermediate during replication. Other retro- elements include the human short and long interspersed nuclear elements (SINEs and LINEs). The explosive growth in available genomic sequence data has enabled the study of ERV diversity and evolution. Using bioinformatics tools, O98 000 human ERVs (HERVs) have been identified, constituting w5% of the genome [4,5]. These sequences are currently classified into 31 families (or lineages), each resulting from a distinct infection of the germline [6]. Complete phylogenies have been reconstructed for several HERV families; these vary in size, genetic variability and topology. Some HERV phylogenies are unusually ‘star-like’ in shape, with short internal branches and long external branches (Table 1; e.g. Figure 1a,c). The size and shape of HERV family phylogenies are a direct result of past evolutionary processes; therefore, we can use such phylogenies to reconstruct the dynamics of endogenous retroviruses in the human genome. Here we outline a simple model of ERV evolutionary dynamics, which predicts both the eventual outcome of a germline infection and the expected shapes of ERV family phylogenies, under different evolutionary scenarios. This provides a framework for understanding and measuring the parameters that determine ERV proliferation and loss, and for testing hypotheses regarding ERV evolution. In these models, ERV proliferation and loss are considered on an evolutionary timescale, thus, each gain or loss event represents the effective fixation or removal of an element within the host population. Mechanisms of ERV proliferation and loss Most HERV families are thought not to be cur- rently proliferating [4], with the probable exception of HERV-K(HML2) [7–9], although ERV families in other Table 1. The variable size and shape of HERV family phylogenies a HERV family Number of HERV elements Phylogeny imbalance (B 1 statistic) b Phylogeny starlike-ness (g statistic) c Phylogeny depth (genetic distance from root to tips) d HERV-E 34 15.9 (15.4, 16.5)* K3.6 (K3.8, K3.4)* 0.144 (0.139, 0.150) HERV-F(b) 23 10.7 (9, 12.2) K7.1 (K7.2, K6.9)* 0.098 (0.094, 0.104) HERV-K(HML5) 37 19.3 (18.5, 20.2) K8.1 (K8.3, K8)* 0.122 (0.118, 0.125) HERV-S 16 8.2 (6.8, 9) K5.4 (K5.6, K5.3)* 0.142 (0.135, 0.150) HERV-K(HML2) 44 16.7 (15.2, 18.4)* K0.9 (K1.1, K0.7) 0.096 (0.091, 0.100) a The B 1 statistic measures tree imbalance [49]. Imbalanced trees are ‘comb-like’, that is, the number of tips falling on either side of each branching point tend to be different (e.g. Figure 1e). The g statistic measures tree starlike-ness [22]. Starlike trees have short internal branches and long terminal branches (e.g. Figure 1a,c). Asterisks indicate values that reject a simple null model of phylogeny shape (the constant-rate birth process) [22]. Upper and lower confidence intervals are shown in parentheses. Values were obtained from previously published HERV sequence alignments [7,11] using a Bayesian Markov Chain Monte Carlo (MCMC) approach, implemented in the computer program BEAST [50]. b The B 1 imbalance statistic depends on tree size and can be compared directly only if the number of elements are identical. c Negative g values are more starlike. d Units are expected nucleotide substitutions per site. Corresponding author: Pybus, O.G. (oliver.pybus@zoo.ox.ac.uk). Available online 16 August 2005 Opinion TRENDS in Microbiology Vol.13 No.10 October 2005 www.sciencedirect.com 0966-842X/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tim.2005.08.004