Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis Kishore R. Sakharkar, 1 Pawan Kumar Dhar 1 and Vincent T. K. Chow 2 Correspondence Vincent T. K. Chow micctk@nus.edu.sg 1 BioInformatics Institute, Singapore 2 Human Genome Laboratory, Department of Microbiology, Faculty of Medicine, National University of Singapore, Singapore Obligatory intracellular parasites have undergone significant genome reduction by gene loss over time in the context of their obligate associations with the host. The flux, streamlining and elimination of genes in these genomes constitute a selective and ongoing process. Comparative analyses of five completely sequenced obligatory intracellular parasite genomes reveal that these genomes display marked similarities in patterns of protein length and frequency distribution, with substantial sharing of a ‘backbone genome’. From category distribution based on the database of cluster of orthologous groups of proteins (COG), it is clear that habitat is a major factor contributing to genome reduction. It is also observed that, in all five obligatory intracellular parasites, the reduction in number of genes/proteins is greater for proteins with lengths of 200–600 amino acids. These comparative analyses highlight that gene loss is function-dependent, but is independent of protein length. These comparisons enhance our knowledge of the forces that drive the extreme specialization of the bacteria and their association with the host. INTRODUCTION Obligatory intracellular parasitism serves as an excellent model to study how bacteria exploit the functions of their host cells. Obligatory intracellular parasites possess small genomes and display a tendency towards further genome reduction (Andersson & Andersson, 1999) (Fig. A and Table A, available as supplementary material in IJSEM Online). Gene degradation appears to be a common feature of obligatory intracellular parasites, targeting overlapping subsets of potentially dispensable genes while adapting to the selective pressures of different niches (Andersson & Andersson, 2001). Therefore, genes found as multiple copies may outline their specific adaptations (Andersson & Kurland, 1998). The flux, streamlining and elimination of genes in genomes of obligate intracellular parasitic species constitute an ongoing process, and may represent a function of lifestyle and genome-coding capacity in terms of compactness. There have been many reports on the evolution of these bacteria from larger genomes by genome deterioration (Andersson et al., 1998). Zomorodipour & Andersson (1999) have provided examples of reductive convergent evolution in the genomes of Rickettsia prowa- zekii and Chlamydia trachomatis, and have associated this phenomenon with metabolic parasitism in response to intracellular habitat. The high fraction of non-coding DNA in many genomes is speculated to represent ancient genes in the process of elimination (Andersson et al., 1998). Furthermore, pseudogenes are postulated to be genes in the process of becoming lost. Andersson et al. (1998) have shown in Rickettsia that deletions are far more common than insertions, and on average much larger in size. They speculated that once a rickettsial gene becomes non- functional, it will be eliminated from the genome solely by mutational events. The higher proportion of intergenic DNA may be scattered remnants of genes lost in a stepwise process (Lawrence et al., 2001; Tamas et al., 2001; Frank et al., 2002). C. trachomatis, Chlamydia pneumoniae, Mycobacterium leprae, R. prowazekii and Rickettsia conorii are completely sequenced eubacterial obligate intracellular parasites that are pathogenic for humans (Andersson et al., 1998; Stephens et al., 1998; Kalman et al., 1999; Cole et al., 2001; Ogata et al., 2001). Despite their similarity in biology and reduced genome size, these species display extreme diversity in tissue tropism and disease expression that hitherto remains a major unanswered question in microbial behaviour. In Published online ahead of print on 23 April 2004 as DOI 10.1099/ ijs.0.63090-0. Bar charts showing a representation of bacterial genome sizes and gene numbers (Fig. A), the distribution of COG categories and their percentage representation in the bacterial genomes (Fig. B) and the percentage change in protein length distributions in the five obligatory intracellular parasites compared to E. coli (Fig. C), and tables listing gene numbers and genome sizes of obligatory intracellular parasites (Table A) and the percentage distribution of different COG categories (Table B) are available as supplementary material in IJSEM Online. 63090 G 2004 IUMS Printed in Great Britain 1937 International Journal of Systematic and Evolutionary Microbiology (2004), 54, 1937–1941 DOI 10.1099/ijs.0.63090-0