Specific Correlations between Relative Synonymous Codon Usage and Protein Secondary Structure Matej Ores Ïic Ï and David Shalloway* Section of Biochemistry Molecular and Cell Biology Cornell University, Ithaca NY 14853, USA We found signi®cant species-speci®c correlations between the use of two synonymous codons and protein secondary structure units by comparing the three-dimensional structures of human and Escherichia coli proteins with their mRNA sequences. The correlations are not explained by codon-context, expression level, GC/AU content, or positional effects. The E. coli correlation is between Asn AAC and the C-terminal regions of b-sheet segments; it may result from selection for translational accuracy, suggesting the hypothesis that downstream Asn residues are important for b-sheet formation. The correlation in human proteins is between Asp GAU and the N termini of a-helices; it may be important for eukaryote- speci®c sequential, cotranslational folding. The kingdom-speci®c corre- lations may re¯ect kingdom-speci®c differences in translational mechan- isms. The correlations may help identify residues that are important for secondary structure formation, be useful in secondary structure predic- tion algorithms, and have implications for recombinant gene expression. # 1998 Academic Press Keywords: cotranslational folding; translational accuracy; aspartate; asparagine; statistical database analysis *Corresponding author Introduction Because there are, on average, approximately three synonymous codons (SCs) for each amino acid, gene sequences can potentially carry much more information than is needed for determining protein amino acid sequences. Relative SC usages (RSCU, the relative frequencies of occurrence of the SCs for a speci®c amino acid) vary by factors of ten or more in species-speci®c ways (Sharp & Li, 1986, 1987), probably as the result of evolution in the presence of mutational biases, selection for translation rate and accuracy (Fiers & Grosjean, 1979; Ikemura, 1985; Bulmer, 1991; Sharp & Matassi, 1994; Akashi, 1994), and possibly other factors. Important in¯uences affecting RSCU could arise from the interactions of nucleic acids (e.g. dinucleotide-dependent mutational biases, DNA and RNA structural constraints, and requirements for RNA stability) or from events at the ribosome (e.g. differing translation rates and accuracies of different SCs). Karlin & Mra Âzek (1996) have pre- sented an extensive list of potential in¯uences and have shown that most of the observed RSCU bias in human and vertebrate genes can be calculated from species-speci®c ``genome signatures'' ± rela- tive dinucleotide abundance frequencies. Such biases tend to proscribe SC choice and reduce the excess information-carrying capacity of the gene sequence, but a signi®cant surplus remains. It is not known if any of this excess capacity is used. In principle, additional information could help regulate processes at the DNA, RNA, or pro- tein levels and affect replication, transcription or translation. However, except in viruses, whose genome sizes are often tightly constrained, large amounts of non-coding sequence are usually avail- able for controlling nucleic acid processing and translational initiation, so there would probably not be much evolutionary pressure to use the SC degeneracy for these purposes. In contrast, rela- tively little non-coding mRNA is present at the ribosome, so the unused information-carrying capacity could be used there to promote correct folding. It is now recognized that many proteins cannot fold correctly in isolation and that assistance (e.g. by chaperonins and protein disul®de isomerases) is often required (Gething & Sambrook, 1992; Hartl, 1996). The nascent peptide can begin folding before it is released from the ribosome (Federov et al., 1992; Goldberg, 1995), and recent evidence E-mail address of the corresponding author: dis2@cornell.edu Abbreviations used: SC, synonymous codon; RSCU, relative SC usage; UCU, unrenormalized codon usage; PDB, Protein Data Bank. Article No. mb981921 J. Mol. Biol. (1998) 281, 31±48 0022 ± 2836/98/310031±18 $30.00/0 # 1998 Academic Press