Yu et al. BMC Evolutionary Biology 2010, 10:192
http://www.biomedcentral.com/1471-2148/10/192
Open Access RESEARCH ARTICLE
© 2010 Yu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attri-
bution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Research article
Whole-proteome phylogeny of large dsDNA
viruses and parvoviruses through a composition
vector method related to dynamical language
model
Zu-Guo Yu
1,2
, Ka Hou Chu*
3
, Chi Pang Li
3
, Vo Anh
1
, Li-Qian Zhou
2
and Roger Wei Wang
4
Abstract
Background: The vast sequence divergence among different virus groups has presented a great challenge to
alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing
tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome
comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for
phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL)
method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast
genomes.
Results: In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses
and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good
agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on
Taxonomy of Viruses (ICTV).
Conclusions: The present method provides a new way for recovering the phylogeny of large dsDNA viruses and
parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some
alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses,
but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.
Background
Viruses were traditionally characterized by morphologi-
cal features (capsid size, shape, structure, etc) and physic-
ochemical and antigenic properties [1]. At the DNA level,
the evolutionary relationships of many families and gen-
era have been explored by sequence analysis of single
gene or gene families, such as polymerase, capsid and
movement genes [1]. The International Committee on
the Taxonomy of Viruses (ICTV) publishes a report on
the virus taxonomy system every five years. Phylogenetic
and taxonomic studies of viruses based on complete
genome data have become increasingly important as
more and more whole viral genomes are sequenced [2-6]
The phylogeny based on single genes or gene families
contains ambiguity because horizontal gene transfer
(HGT), along with gene duplication and gene capture
from hosts, appear to be frequent in large DNA viruses
[7-10]. Whether single-gene based analysis can properly
infer viral species phylogeny is debatable [2]. One of the
unusual aspects of viral genomes is that they exhibit high
sequence divergence [7,11]. Several works have
attempted to infer viral phylogeny from their whole
genomes [1,2,4,8,12-19]. Among these studies of genome
trees, the alignment-free methods proposed by Gao and
Qi [1], Wu et al [2], Gao et al [12] and Stuart et al [16]
seem to be sufficiently powerful to resolve the phylogeny
of viruses at large evolutionary distance. The present
study represents another effort of applying an alignment-
free method in analysing complete genome data to eluci-
* Correspondence: kahouchu@cuhk.edu.hk
1
Department of Biology, The Chinese University of Hong Kong, Shatin, N.T.,
Hong Kong, China
Full list of author information is available at the end of the article