Journal of General Virology (2000), 81, 769–780. Printed in Great Britain ................................................................................................................................................................................................................................................................................... Phylogenetic analysis of GBV-C/hepatitis G virus Donald B. Smith, 1 Miren Basaras, 2 Simon Frost, 3 Dan Haydon, 4 Narcisa Cuceanu, 1 Linda Prescott, 1 Cara Kamenka, 1 David Millband, 1 Mahomed A. Sathar 5 and Peter Simmonds 1 1 Department of Medical Microbiology, University of Edinburgh, Summerhall, Edinburgh EH9 1QH, UK 2 Department of Microbiology, School of Medicine, Universidad del Pais Vasco, 48080 Bilbao, Spain 3 Centre for HIV Research, University of Edinburgh, Waddington Building, King’s Buildings, West Mains Road, Edinburgh EH9 3JN, UK 4 Centre for Tropical Veterinary Medicine, University of Edinburgh, Easter Bush, Roslin EH25 9RG, UK 5 Department of Medicine, University of Natal, Congella, South Africa 4013 Comparison of 33 epidemiologically distinct GBV-C/hepatitis G virus complete genome sequences suggests the existence of four major phylogenetic groupings that are equally divergent from the chimpanzee isolate GBV-C tro and have distinct geographical distributions. These four groupings are not consistently reproduced by analysis of the virus 5-noncoding region (5-NCR), or of individual genes or subgenomic fragments with the exception of the E2 gene as a whole or of 200–600 nucleotide fragments from its 3 half. This region is upstream of a proposed anti-sense reading frame and contains conserved potential RNA secondary structures that may be capable of directing the internal initiation of translation. Phylogenetic analysis of this region from certain South African isolates is consistent with previous analysis of the 5-NCR suggesting that these belong to a fifth group. The geographical distribution of virus variants is consistent with a long evolutionary history that may parallel that of pre-historic human migrations, implying that the long-term evolution of this RNA virus is extremely slow. Introduction The similarity in genome organization between GB virus- Chepatitis G virus (GBV-CHGV) and hepatitis C virus (HCV) has led to the naı ve expectation that variation of these closely related and persistent flaviviruses might also be similar. However, our limited understanding of the causal reasons for virus variability is underscored by the increasing evidence that these viruses vary in quite different ways. Although both viruses have a similar rate of nucleotide substitution during persistent infection [04–1910 - for HCV (Major et al., 1999 ; Smith et al., 1997 a ; Okamoto et al., 1992; Ogata et al., 1991), 04–2410 - for HGV (Khudyakov et al., 1997; Nakao et al., 1997)], GBV-CHGV lacks a hypervariable region comparable to that present at the NH terminus of the HCV E2 Author for correspondence : Donald Smith. Present address : Garden Cottage, Clerkington, Haddington, East Lothian EH41 4NJ, UK. e-mail Donald.B.Smithgardencottage.screaming.net The GenBank accession numbers of the sequences reported here are AF181977–AF181981. protein (Takahashi et al., 1997 a ; Nakao et al., 1997) and observed ratios of synonymous to nonsynonymous sub- stitution are higher for GBV-CHGV (30 : 1) than within HCV subtypes (9 : 1) although the latter are less divergent (Muerhoff et al., 1997; Smith et al., 1997 b). In addition, while different genotypes of HCV differ by more than 30 %, the most extreme GBV-CHGV variants differ by only 14 %. Previous studies have identified three (Suzuki et al., 1999 ; Okamoto et al., 1997), four (Charrel et al., 1999) or five (Takahashi et al., 1997 b) phylogenetic groupings of GBV-CHGV, although some of these groupings are weak and inconsistent between different studies. However, whereas HCV genotypes can be dis- tinguished by phylogenetic analysis of a variety of subgenomic regions as small as 222 nt, variants of GBV-CHGV cannot be reliably identified in this way. Systematic analysis of six complete GBV-CHGV genome sequences revealed that congruent phylogenetic relationships were obtained for only a minority of 300, 600 and 1200 nt fragments, and that the optimal region was all or part of the 5-noncoding region (5- NCR) (Muerhoff et al., 1997; Smith et al., 1997 b). At present 33 epidemiologically unrelated GBV-C HGV 0001-6655 2000 SGM HGJ