Empirical evaluation of partitioning schemes for phylogenetic analyses of mitogenomic data: An avian case study Alexis F.L.A. Powell ⇑ , F. Keith Barker, Scott M. Lanyon Department of Ecology, Evolution and Behavior, Bell Museum of Natural History, 100 Ecology Building, 1987 Upper Buford Circle, University of Minnesota, St. Paul, MN 55108, USA article info Article history: Received 29 May 2012 Revised 8 September 2012 Accepted 8 September 2012 Available online 18 September 2012 Keywords: Avian RNA structure Dataset partitioning Mitochondrial genome Phylogenetic analysis abstract Whole mitochondrial genome sequences have been used in studies of animal phylogeny for two decades, and current technologies make them ever more available, but methods for their analysis are lagging and best practices have not been established. Most studies ignore variation in base composition and evolu- tionary rate within the mitogenome that can bias phylogenetic inference, or attempt to avoid it by excluding parts of the mitogenome from analysis. In contrast, partitioned analyses accommodate heter- ogeneity, without discarding data, by applying separate evolutionary models to differing portions of the mitogenome. To facilitate use of complete mitogenomic sequences in phylogenetics, we (1) suggest a set of categories for dividing mitogenomic datasets into subsets, (2) explore differences in evolutionary dynamics among those subsets, and (3) apply a method for combining data subsets with similar proper- ties to produce effective and efﬁcient partitioning schemes. We demonstrate these procedures with a case study, using the mitogenomes of species in the grackles and allies clade of New World blackbirds (Icteridae). We found that the most useful categories for partitioning were codon position, RNA secondary structure pairing, and the coding/noncoding distinction, and that a scheme with nine data groups outper- formed all of the more complex alternatives (up to 44 data groups) that we tested. As hoped, we found that analyses using whole mitogenomic sequences yielded much better-resolved and more strongly-sup- ported hypotheses of the phylogenetic history of that locus than did a conventional 2-kilobase sample (i.e. sequences of the cytochrome b and ND2 genes). Mitogenomes have much untapped potential for phylogenetics, especially of birds, a taxon for which they have been little exploited except in investiga- tions of ordinal-level relationships. Ó 2012 Elsevier Inc. All rights reserved. 1. Introduction Mitochondrial genomes (mitogenomes) are an attractive source of data for molecular phylogenetic studies of animal taxa. Because of their rapid time to coalescence, relatively high substitution rates, and large size (17,000 bp), mitogenomes are more likely than other loci to evolve in concert with, and harbor evidence of, the population histories of species (Moore, 1995). Moreover, their high copy number, haploidy, and lack of recombination make mitogenomes especially easy to obtain, sequence, and analyze (Avise, 1998; Berlin et al., 2004). Given their merits, we contend that mitochondrial DNA (mtDNA) sequences should be included as one marker among many (Fisher-Reid and Wiens, 2011) in coalescent-based ‘‘species tree’’ and other multilocus analyses, rather than being abandoned for use in phylogeny construction, as some have advocated (e.g. Ballard and Whitlock, 2004; Galtier et al., 2009; reviewed by Rubinoff and Holland (2005)). Even as technological advances reduce the cost and difﬁculty of sequencing large numbers of nuclear loci, so should there be a concomitant increase in the use of mitogenomes, as they too are more readily acquired, whether intentionally or as by-products of genomic sequencing (e.g. Nabholz et al., 2010). Consequently, we argue that the routine practice of utilizing only 1–2 kilobases of mtDNA se- quence in phylogenetic analyses should be replaced by the use of whole mitogenomes so as to take full advantage of the potential resolving power of the locus, especially with groups of closely- related organisms in which genetic distances are small. Although mitogenomic data have great potential, standards for their rigorous and objective use in phylogenetic analyses are currently lacking. Of particular relevance to developing best methods for phyloge- netic analyses of mitogenomes is that they exhibit heterogeneity in base composition and evolutionary rates at various scales across the molecule (Anderson et al., 1982; Cummings et al., 1995), which suggests that such analyses should beneﬁt from data partitioning (Yang, 1996; Nylander et al., 2004). Partitioning improves model ﬁt by dividing alignments into relatively homogeneous sets of sites before selecting and optimizing a substitution model for each set independently. Nevertheless, data partitioning is not widely used with mitogenomes. To survey current practice, we examined 71 1055-7903/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ympev.2012.09.006 ⇑ Corresponding author. Fax: +1 612 624 6777. E-mail address: alveypowell@yahoo.com (A.F.L.A. Powell). Molecular Phylogenetics and Evolution 66 (2013) 69–79 Contents lists available at SciVerse ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev