Web tools for molecular epidemiology of tuberculosis Amina Shabbeer, Cagri Ozcaglar, Bülent Yener, Kristin P Bennett Departments of Mathematical Science and Computer Science, Rensselaer Polytechnic Institute, Troy, NY-12180, United States article info Article history: Received 17 June 2011 Received in revised form 14 August 2011 Accepted 19 August 2011 Available online 28 August 2011 Keywords: Molecular epidemiology Tuberculosis Spoligotype MIRU-VNTR Genomic databases MTBC classification abstract In this study we explore publicly available web tools designed to use molecular epidemiological data to extract information that can be employed for the effective tracking and control of tuberculosis (TB). The application of molecular methods for the epidemiology of TB complement traditional approaches used in public health. DNA fingerprinting methods are now routinely employed in TB surveillance programs and are primarily used to detect recent transmissions and in outbreak investigations. Here we present web tools that facilitate systematic analysis of Mycobacterium tuberculosis complex (MTBC) genotype informa- tion and provide a view of the genetic diversity in the MTBC population. These tools help answer ques- tions about the characteristics of MTBC strains, such as their pathogenicity, virulence, immunogenicity, transmissibility, drug-resistance profiles and host-pathogen associativity. They provide an integrated platform for researchers to use molecular epidemiological data to address current challenges in the understanding of TB dynamics and the characteristics of MTBC. Ó 2011 Published by Elsevier B.V. 1. Introduction Over the past two decades, the development of methods for the molecular epidemiology of tuberculosis (TB) have helped create a better understanding of this disease and its causative agent, Myco- bacterium tuberculosis complex (MTBC). DNA fingerprinting meth- ods such as spoligotyping, Mycobacterial Interspersed Repetitive Units-Variable Number Tandem Repeats (MIRU-VNTR) typing and IS6110 restriction fragment length polymorphism (RFLP) typing have provided insights into the genetic diversity of the population structure of the MTBC (Mathema et al., 2006). Primarily, these typ- ing methods aid traditional epidemiological approaches to detect unsuspected transmission links, thus addressing the shortcomings of standard contact tracing methods in identifying transmission events. Since epidemiologically-linked patients have MTBC isolates with identical fingerprints, the fingerprint can serve as a basic tool to distinguish between reactivation of latent infections and recent transmissions and in identifying chains of transmissions (CDC, 2011). Additionally, DNA fingerprint data have been useful in pop- ulation-based studies and have helped develop a deeper under- standing of the disease dynamics. There is great potential in further insights that can be created using routinely collected geno- type information. In this study, we explore available web-based tools that may be applied to existing molecular epidemiologic data to address current challenges in TB research. A summary of tools surveyed in this paper are presented in Table 1 and in the companion web- site at http://tbinsight.cs.rpi.edu/molepisurvey.html. Throughout this paper, we utilize the surveillance data obtained from the New York State Department of Health (henceforth referenced as NYS), comprised of spoligotype and MIRU type information of MTBC strains from patients diagnosed during the period 2004– 07. The NYS dataset is comprised of 674 isolates: 268 distinct spoligotypes, 361 distinct MIRU types and 500 distinct RFLP pat- terns. This genotype information augmented with expert-assigned major lineage labels is used to explore and test the various tools presented. In Section 2, we provide some background of the molecular methods utilized in the epidemiology of TB. In subsequent sec- tions, we present tools that can be categorized as follows: dat- abases, transmission and mutation models, classification tools, and visualization tools. In Section 3, we explore available DNA fingerprint databases that help explore the genetic diversity and bio-geographic distribu- tion of MTBC strains worldwide, and explore potential applications of these data. We also list some databases that investigate MTBC at the detailed genomic level. These databases provide a platform for researchers to share their data, and analyze their results in con- junction with data from other studies. In Section 4, we look at mathematical models of the transmis- sion and mutation of MTBC strains that use DNA fingerprint infor- mation to characterize TB dynamics. We explore the application of these models in detecting potential outbreaks. In Section 5, we analyze various classification models. Phyloge- netic analyses have shown that MTBC strains may be classified into 1567-1348/$ - see front matter Ó 2011 Published by Elsevier B.V. doi:10.1016/j.meegid.2011.08.019 Corresponding author. Tel.: +1 518 276 6899; fax: +1 518 276 4824. E-mail addresses: shabba@cs.rpi.edu (A. Shabbeer), ozcagc2@cs.rpi.edu (C. Ozcaglar), yener@cs.rpi.edu (B. Yener), bennek@rpi.edu (K.P. Bennett). Infection, Genetics and Evolution 12 (2012) 767–781 Contents lists available at SciVerse ScienceDirect Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid