Indian Journal of Biochemistry & Biophysics Vol. 39. February 2002, pp. 35-48 Compositional correlation and codon usage studies in Buchnera aphidicola S K Gupta, T K Bhattacharyya and T C Ghosh* Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, Calcutta 700 054. India Received 5 February 2001; revised and acceptt:d 24 May 2001 Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC le vel s among the three codon positions is i>II>I1I as observed in other extremely high AT rich organisms. B. aphidicola being an AT ric h organism is expected to have A and/or T at the third positions of codons. Overall codon usage analyses indicate that A and/or Tending codons are predominant in this organism and some particular amino acids are abundant in the coding region of genes. However, multivariate statistical analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the GC contents at the third synonymous positions of codons, and the other being associated with the expression level of genes. Moreover, codon usage biases of the highly expressed genes are almost identical with the overall codon usage biases of all the genes of thi s organism. These observations suggest that mutational bias is the main factor in determining the codon usage variation among the genes in B. aphidicola. Studies on compositional properties and codon usage biases of Buchnera aphidicola genome can provide valuable information on the genetic organization of this organism. Correlation betwee n Bacterial genomic (G+C) content and phylogeny has been known since 1 0ng 1.2. The distribution of GC of the first, second as well as the third position of codons are positively correlated with the overall genomic (G+C) contents 2 . 4 . However, the magnitude of correlation differs among the three different codon positions. The universal order of correlation among the three codon positions has been observed to be III>I>II5. Using codon usage data the basic features of the genomic organization of an organism can be understood 6 . 12 . It is well known that the synonymous codon usage bias is non-random and species specific 13 . Codon usage patterns not only differ significantly from organism to organism, but also among the different genes within the same organism l4 . Compositional constraints playa major role in determining the codon usage variation among the genes as observed in the case of extremely GC rich or AT rich organisms I5 . 17 . In some unicellular organisms, both compositional constraints and translational selection are operational in determining the codon usage variation among the genes I8 ' 22 . Recently, it has been reported that cellular location of the gene products also determine the codon usage *To whom correspondence may be addressed Fax: 9 1-334-3886; E-mail: tapash @boseins t. ernet.in variation among the genes l2 and it was also observed that replication-translational selection is responsible for codon usage variation among the genes of . Borrelia burgdoiferi genome 10. B. aphidicola is an obligate intracellular non- cultivable bacterial symbiont of aphid Schizaphis graminum23. Being an AT rich orga nism 24 , it is an excellent model to analyze the compositional correlations among the three different positions of codons to yield useful information in understanding the codon usage variation among the genes in thi s organism. With this in view, we have collected a ll DNA sequences from the complete genome and made a detailed studies on the compositional properties as well as codon usage biases of this organism. Materials and Methods The complete genome of B. aphidicola was obtain- ed from ncbi.nlm.nih.gov/genbank/genomes. To minimize the sampling errors we have chosen only those sequences that are greater than 300 bp and have COITect initial and termination codons. In this way we have selected 438 DNA sequences for the data analysis. The (G+C) distributions at the three different codon positions as well as the overall genomic (G+C) contents were calculated by using GCUA (General Codon Usage Analysis)25. Relative Synonymous Codon Usage (RSCU) values were used to study the overall codon usage variation among the genes. RSCU is defined as the