Phylogenetic Analysis as a Tool in Molecular Epidemiology of Infectious Diseases BARRY G. HALL AND MIRIAM BARLOW Phylogenetics is a powerful tool for microbial epidemiology, but it is a tool that is often misused and misinterpreted by the field. Microbial epidemiologists are cautioned that in order to draw any inferences about the order of descent from a common ancestor it is necessary to correctly root a phylogenetic tree. Epidemiological samples of microbial populations typically include both ancestors and their descendants. In order to illustrate the relationships of those isolates, the phylogenetic method used must be able to detect zero-length branches. Unweighted Pair-Group Method (UPGMA) is the phylogenetic method that is most widely used in microbial epidemiology. Because UPGMA cannot detect zero length branches, and because it places the root of the tree based on a usually-false assumption, UPGMA is the worst possible choice among the several phylogenetic methods available. Because microbial epidemiology deals with relationships among strains within a species, rather than with relationships among species, recombination within those species can render phylogenetic trees meaningless and positively misleading. When there is evidence of significant recombination within the species of interest phylogenetic trees should not be used at all. Instead, alternative tools such as eBURST should be used to understand relationships among isolates. Ann Epidemiol 2006;16:157–169. Ó 2006 Elsevier Inc. All rights reserved. KEY WORDS: Phylogenetic Trees, UPGMA, Microbial Epidemiology, Recombination, eBURST. INTRODUCTION When conducting an epidemiologic study, the goal of that study is generally to determine the cause or the source of some health related phenomenon that affects a population and the distribution of that phenomenon throughout a pop- ulation (1). While the source of many chronic, behavioral, or noninfectious diseases can be determined by studying the attributes, behaviors, and environment of the population of interest, it is often much more difficult to track the source of an infectious disease within a population by using these methods. The difficulties in tracking the source of an infectious agent occur because the pool of individuals in- fected by the disease experiences turnover of infected indi- viduals (2), clinical laboratories have limited resources for identifying and reporting cases (2, 3), numerous infectious diseases cause similar symptoms (4), and many infected people do not seek treatment for their infections (5). For over 30 years, molecular epidemiology has served as a very important tool for studying the spread of infectious diseases (4). Restriction fragment length polymorphisms (RFLP), randomly amplified polymorphic DNA (RAPD), and more recently, multiple locus sequence typing (MLST) have been used to determine the relatedness of bacterial strains (4). While these methods are useful in identifying whether there is a single source or if there are multiple sources of an infectious agent, by themselves, these methods are limited in their ability to identify the source of an infectious strain because they do not tell us anything about the direction in which evolution has occurred. For example, if a noninfectious strain is closely related to an infectious strain, none of these methods tells us whether the infectious strain was derived from the noninfectious strain or vise versa. Phylogenetic methods can be used to analyze nucleotide sequence data, such as those that are available in MLST analyses in such a way that the order of descent of related strains can be determined. When coupled with appropriate phylogenetic analysis, molecular epidemiology has the potential to elucidate mechanisms that lead to microbial outbreaks and epidemics. Despite the utility of phyloge- netics and the inexpensive, readily available software and manuals available for phylogenetic analyses, phylogenetic methods are often inappropriately applied. Even when appropriately applied, they are often poorly explained and are therefore poorly understood. Because phylogenetic analysis is inexpensive, especially when sequence data are already available, and because phylogenetic analysis shows much more clearly how infectious agents are spreading and evolving than sequence data alone, it is important for molecular epidemiologists to understand, to correctly apply, and to correctly interpret phylogenies and phylogenetic methods. This review, while not comprehensive is intended From the Biology Department, University of Rochester, Rochester, NY (B.G.H.); the Bellingham Research Institute, Bellingham, WA (B.G.H.); and the Department of Epidemiology, Rollins School off Public Health, Emory University, Atlanta, GA (M.B.). Address correspondence to: Barry G. Hall, Bellingham Research Institute, 218 Chuckanut Point Rd., Bellingham, WA 98229. E-mail: drbh@mail.Rochester.edu. Received March 24, 2005; accepted April 20, 2005. Ó 2006 Elsevier Inc. All rights reserved. 1047-2797/06/$–see front matter 360 Park Avenue South, New York, NY 10010 doi:10.1016/j.annepidem.2005.04.010