Interpreting genotype cluster sizes of Mycobacterium tuberculosis isolates typed with IS6110 and spoligotyping Fabio Luciani a, * , Andrew R. Francis b , Mark M. Tanaka a,c a School of Biotechnology and Biomolecular Sciences, University of New South Wales, NSW 2052, Australia b School of Computing and Mathematics, University of Western Sydney, Australia c Evolution & Ecology Research Centre, University of New South Wales, Australia Received 10 September 2007; received in revised form 12 December 2007; accepted 12 December 2007 Available online 1 February 2008 Abstract Molecular techniques such as IS6110-RFLP typing and spacer oligonucleotide typing (spoligotyping) have aided in understanding the transmission patterns of Mycobacterium tuberculosis. The degree of clustering of isolates on the basis of genotypes is informative of the extent of transmission in a given geographic area. We analyzed 130 published data sets of M. tuberculosis isolates, each representing a sample of bacterial isolates from a specific geographic region, typed with either or both of the IS6110-RFLP and spoligotyping methods. We explored common features and differences among these samples. Using population models, we found that the presence of large clusters (typically associated with recent transmission) as well as a large number of singletons (genotypes found exactly once in the data set) is consistent with an expanding infectious population. We also estimated the mutation rate of spoligotype patterns relative to IS6110 patterns and found the former rate to be about 10–26% of the latter. This study illustrates the utility of examining the full distribution of genotype cluster sizes from a given region, in the light of population genetic models. # 2007 Elsevier B.V. All rights reserved. Keywords: Tuberculosis; Population genetics; Genetic diversity; Evolution; Molecular epidemiology; Mutation; Genotype; IS6110; Spoligotype 1. Introduction Tuberculosis remains a major global infectious disease, causing around two million deaths each year. While traditional epidemiological methods such as contact tracing are central to attempts to contain tuberculosis, these are increasingly complemented by use of data from molecular typing techniques (Small et al., 1994; Kodmon et al., 2006; Lari et al., 2005; Quitugua et al., 2002; Chan-Yeung et al., 2003; Van Soolingen et al., 1999; Mathema et al., 2006). DNA fingerprinting methods have enabled the classification of isolates into distinct strains, and thus the characterization of genetic diversity of Mycobacterium tuberculosis in outbreaks (Nguyen et al., 2004; Van Soolingen, 2001). The most commonly applied molecular typing techniques are IS6110-restriction fragment length polymorphism (IS6110-RFLP) (Eisenach et al., 1988), and spacer oligonucleotide typing (spoligotyping) (Kamerbeek et al., 1997). The former uses variability produced by movement of the insertion sequence IS6110, while the latter identifies the presence or absence of 43 DNA spacer sequences located between repetitive units at the direct repeat (DR) locus. The IS6110-RFLP and spoligotyping methods are sufficiently discriminating to separate unrelated isolates, and yet the resulting patterns are stable enough to allow closely related isolates to be grouped (Niemann et al., 1999; Van Soolingen, 2001). The use of these typing techniques to study the epidemiology of tuberculosis has now become widespread, with over 150 papers appearing in 2006 alone that mention IS6110 or spoligotyping (Fig. 1). Clusters of identical or highly similar genotypes are widely interpreted as resulting from recent transmission caused by a single case. On the other hand, genotypes appearing uniquely within a data set, named singletons from here on, are generally considered to have arisen from migration or recent reactivation of remotely acquired infections. In addition to transmission rates, several other factors influence the patterns of clustering. www.elsevier.com/locate/meegid Available online at www.sciencedirect.com Infection, Genetics and Evolution 8 (2008) 182–190 * Corresponding author. Tel.: +61 2 9385 3701; fax: +61 2 9385 1483. E-mail address: luciani@unsw.edu.au (F. Luciani). 1567-1348/$ – see front matter # 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2007.12.004