Yu et al. BMC Evolutionary Biology 2010, 10:192 http://www.biomedcentral.com/1471-2148/10/192 Open Access RESEARCH ARTICLE © 2010 Yu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attri- bution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Research article Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model Zu-Guo Yu 1,2 , Ka Hou Chu* 3 , Chi Pang Li 3 , Vo Anh 1 , Li-Qian Zhou 2 and Roger Wei Wang 4 Abstract Background: The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results: In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions: The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. Background Viruses were traditionally characterized by morphologi- cal features (capsid size, shape, structure, etc) and physic- ochemical and antigenic properties [1]. At the DNA level, the evolutionary relationships of many families and gen- era have been explored by sequence analysis of single gene or gene families, such as polymerase, capsid and movement genes [1]. The International Committee on the Taxonomy of Viruses (ICTV) publishes a report on the virus taxonomy system every five years. Phylogenetic and taxonomic studies of viruses based on complete genome data have become increasingly important as more and more whole viral genomes are sequenced [2-6] The phylogeny based on single genes or gene families contains ambiguity because horizontal gene transfer (HGT), along with gene duplication and gene capture from hosts, appear to be frequent in large DNA viruses [7-10]. Whether single-gene based analysis can properly infer viral species phylogeny is debatable [2]. One of the unusual aspects of viral genomes is that they exhibit high sequence divergence [7,11]. Several works have attempted to infer viral phylogeny from their whole genomes [1,2,4,8,12-19]. Among these studies of genome trees, the alignment-free methods proposed by Gao and Qi [1], Wu et al [2], Gao et al [12] and Stuart et al [16] seem to be sufficiently powerful to resolve the phylogeny of viruses at large evolutionary distance. The present study represents another effort of applying an alignment- free method in analysing complete genome data to eluci- * Correspondence: kahouchu@cuhk.edu.hk 1 Department of Biology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China Full list of author information is available at the end of the article