Submitted: December 14, 2007 Phylogenetic Diversity on Split Networks Bui Quang Minh, Steffen Klaere and Arndt von Haeseler Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Univer- sity of Vienna,Medical University of Vienna, Veterinary University of Vienna INI Preprint Number: N107090 Abstract In biodiversity conservation, one is interested in selecting a subset of taxa for preservation priority. Phylogenetic diversity (PD ) provides a quantitative measure for taxon selection on phylogenetic trees. In particular, PD is the total length of the minimal subtree induced by the selected taxa. Recently, it has been shown that on trees the maximal PD score and the corresponding subset of taxa can be computed by a greedy algorithm. However, if evolution is not treelike and networks are a more appropriate illustration of phylogenetic relationships, then the greedy strategy no longer works. Here, we will extend the notion of PD to phylogenetic networks. To this end, we suggest a dynamic programming algorithm (PD-NET) which guarantees the com- putation of optimal PD scores and PD sets for circular networks, a commonly en- countered category of networks. PD-NET has polynomial time complexity. Finally we apply PD-NET to biological data and compare the resulting PD sets to the se- lection of taxa derived from a tree. The outcome indicates that it is advisable to include also non-treelike effects when dealing with conservation questions. Keywords: phylogenetic diversity, dynamic programming, phylogenetic network, split system, biodiversity conservation. Introduction Biodiversity embraces the variety of life from plants to animals, from micro- to macro- organisms, from genes to genomes and ecosystems. The conservation planning of biodi- versity is concerned with many research projects and intense discussions (e.g., Wilson, 1997; Gaston and Spicer, 2004). In the last decades, the diversity of a set of taxa has been primarily measured by genetic distance (Vane-Wright et al., 1991), i.e. by the discrepancy between the genetic information of taxa. In particular, one is interested in selecting a subset of k representative taxa which maximize the total genetic distance of all evolutionary lineages spanned by these taxa. This concept was further extended to comparative genomics in prioritizing taxa under sequencing projects (Pardi and Goldman, 2005). 1