The compositional transition of vertebrate genomes: an analysis of the secondary structure of the proteins encoded by human genes Giuseppe D’Onofrio a, * , Tapash Chandra Ghosh b a Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, 80121 Napoli, Italy b Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M- Kolkata 700 054, India Received 21 October 2004; received in revised form 12 November 2004; accepted 23 November 2004 Available online 7 January 2005 Received by H.E. Roman Abstract Fluctuations and increments of both C 3 and G 3 levels along the human coding sequences were investigated comparing two sets of Xenopus /human orthologous genes. The first set of genes shows minor differences of the GC 3 levels, the second shows considerable increments of the GC 3 levels in the human genes. In both data sets, the fluctuations of C 3 and G 3 levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C 3 and G 3 levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results. D 2004 Elsevier B.V. All rights reserved. Keywords: Isochore; Base composition; Codon usage; Natural selection; Biased gene conversion; Protein structure 1. Introduction Fluctuations of the guanine and cysteine levels at first, second, and third codon positions were observed first by a sliding window analysis on several prokaryotic and eukary- otic genes (Wada and Suyama, 1985). The authors found a negative correlation between the GC 3 and GC 1+2 levels (the percent of guanine and cytosine at each codon position) and explained as a bneed to have a uniform double-helix stability and/or homogeneous codon–anticodon interaction in the geneQ (Wada and Suyama, 1986). However, the inherent limitation in sliding window approach is due to the fact that it can never be used realistically to get a straightforward biological meaning as the window length, an arbitrary parameter in window approach, has a great impact to the outcome of the results. Therefore, as an alternative to the Wada and Suyama’s approach, the coding regions corre- sponding to the secondary structures of the encoded proteins were analyzed to determine the base composition along the genes (Chiusano et al., 1999; D’Onofrio et al., 2002). The analysis of multiple alignments of coding sequences belonging to at least four different mammalian orders showed that the G 3 and C 3 levels were significantly different in the different predicted secondary structures of the proteins (Chiusano et al., 1999). The results were confirmed by analyzing a set of human genes for which complete coding sequences and crystallographic data of the encoded protein were both available (D’Onofrio et al., 2002). Moreover, it was found that the GC 3 levels were correlated positively with the average hydrophobicity of the secondary structures, whereas the GC 1+2 levels showed an opposite trend (D’Onofrio et al., 2002). In other words, the opposite fluctuations of GC 3 and GC 1+2 levels along the gene were dictated at least also, if not mainly, by the physico-chemical 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2004.11.037 Abbreviations: GC, percent of guanine and cytosine; GC 3 and GC 1+2 , percent of guanine and cytosine at each codon position; BGC, biased gene conversion; RSCU, relative synonymous codon usage; TE, transposable element. * Corresponding author. Tel.: +39 81 5833311; fax: +39 81 7463155. E-mail address: donofrio@szn.it (G. D’Onofrio). Gene 345 (2005) 27 – 33 www.elsevier.com/locate/gene