New Invariant of DNA Sequence Based on 3DD-Curves and Its Application on Phylogeny XIZHEN ZHANG, JIAWEI LUO, LI YANG School of Computer and Communication, Hunan University, Changsha, Hunan 410082, China Received 23 January 2007; Revised 21 March 2007; Accepted 23 March 2007 DOI 10.1002/jcc.20760 Published online 2 May 2007 in Wiley InterScience (www.interscience.wiley.com). Abstract: The Z_inv, a new invariant based on 3DD-curves of DNA sequence, which is simple for calculation and it approximates to the leading eigenvalues of the matrix associated with DNA sequence. The utility of our invariant is illustrated on the DNA sequence of 11 species. In this study, we use the Z_inv to analyze the phylogenetic rela- tionships for the seven HA (H5N1) sequences of avian influenza virus. q 2007 Wiley Periodicals, Inc. J Comput Chem 28: 2342–2346, 2007 Key words: DNA; Z_inv; 3DD-curves; phylogeny; HA (H5N1) Introduction The rapid growth to databases of DNA primary sequence data has lead to search methods to numerically characterize these data. Some graphical representations of DNA sequences have been given by Hamori, Zhang, Randic, Liao, and Li et al. based on 3D graphical representation of DNA sequences. Hamori and Ruskin 1 first mapped the DNA sequence into a three-dimensional curve (H curve), which is especially suited for long sequence. Zhang and Zhang 2 introduces a unique three-dimensional curve (Z curve) for a given DNA sequences in the sense that each can be uniquely reconstructed given the other. Randic outlined the construction of a 3D graphical rep- resentation of DNA primary sequence 3 and proteomics maps. 4 For assigning different vectors to the four nucleotides A, G, C, and T, Liao obtained different 3D representations, 5–7 a DNA sequence is determined by any pair of its three characteristic curves. Liao 8 applied a nondegeneracy 3D representation in constructing phylogenetic tree, Wang 9 outlined a graphical method to construct phylogenetic tree based on a unique 3D representation. Among all above-mentioned, Randic, Liao, and Wang associ- ated a DNA sequence having n bases with n n non-negative real symmetric matrix and use its leading eigenvalue to charac- terize the DNA sequence in their studies. 3–9 Zheng used the geo- metrical centre and the covariance matrix to reflect the center position and the curve’s distribution respectively; the distances are obtained by calculating the leading eigenvalues and the cosine between corresponding eigenvectors. 10 Li proposed a new invariant ALE-index; it’s an approximate value of leading eigenvalue. 11 Zhang outlined a new invariant inv based on 2DD- curves. 12 In this study, we propose a new invariant of DNA sequence based on nondegeneracy 3DD-curves (Y. Zhang, B. Liao), which is simple for calculation. The utility of our invariant Z_inv is illustrated on the DNA sequence of 11 species and we use Z_inv to construct the phylogenetic tree for eight HA (H5N1) genomes. Invariants of DNA Sequences Definition of Invariants Given a DNA sequence with n bases, it always associates with a n n non-negative real symmetric matrix whose diagonal ele- ments equal to zero. Let M ¼ða ij Þ nn be such a matrix that a ij 0; a ij ¼ a ji ; a ii ¼ 0 for i, j ¼ 1,2,...,n. The leading eigenvalue (M) is an impor- tant invariant and it is effectively used to analyze the similarity of DNA sequence. To simplify the calculation, Li Chun 11 pro- posed a new sequence invariant ALE-index approximates to (M). ¼ ðMÞ¼ 1 2 1 n kMk m1 þ ffiffiffiffiffiffiffiffiffiffiffi n 1 n r kMk F 8 > > > : 9 > > > ; (1) Contract/grant sponsor: Item Foundation of Hunan Provincial Depart- ment of Finance ([2005]90) Contract/grant sponsor: Hunan Provincial Natural Science Foundation of China; contract grant number: 06JJ4076 Correspondence to: X. Zhang; e-mail: chunagp@hotmail.com q 2007 Wiley Periodicals, Inc.