Gene, 152 (1995) 127 132 © 1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$09.50 127 GENE 08497 Compositional properties of nuclear genes from Plasmodiumfalciparum* (Malaria; parasites; housekeeping genes; antigen genes; Staphylococcus) Hector Musto**, Helena Rodriguez-Maseda** and Giorgio Bernardi Laboratoire de G~n~tique Mol~culaire, lnstitut Jacques Monod, 75005 Paris, France Received by L. Pereira da Silva: 31 July 1994; Revised/Accepted: 31 August 1994; Received at publishers: 10 October 1994 SUMMARY We have analyzed the compositional distributions of coding sequences and their different codon positions, as well as the codon usage of the nuclear genes of Plasmodium falciparum, a parasite characterized by an extremely GC-poor genome. As expected, coding sequences are AT-rich, codon usage is strongly biased towards A or T in third codon positions, and some particular amino acids (aa) are especially abundant in the encoded proteins. Remarkably, however, no difference was detected between housekeeping (HK) and antigen (Ag) genes, in spite of differences in expression level and evolutionary constraints. Moreover, all the features found in P. falciparum are very similar to those found in a bacterium characterized by a very GC-poor genome, Staphylococcus aureus. These findings stress the importance of compositional constraints in determining codon usage and aa utilisation. INTRODUCTION A striking feature of Plasmodiumfalciparum, a unicellu- lar parasite responsible for the most virulent and wide- spread form of human malaria, is that it hosts the GC-poorest nuclear genome known so far (Pollack et al., 1982; McCutchan et al., 1984). This genome, which only comprises 3 x 107 bp (Weber, 1988) organized in 14 chro- Correspondence to: Dr. G. Bernardi, Laboratoire de G6n6tique Mot6culaire, lnstitut Jacques Monod, 2 Place Jussieu, 75005 Paris, France. Tel. (33-1 ) 4329-5824; Fax (33-1 ) 4427-7977; e-mail: Bernardi@citi2.fr *Presented at the UNESCO-WHO Meeting on Combatting Malaria, Paris, France, 19 21 January 1994. **Permanent address: (H.M.) Departamento de Bioquimica, Facultad de Ciencias, Tristan Narvaja 1674, Montevideo 11200, Uruguay. Fax (598-2) 409-973; (H.M, and H..R.-M.) Departamento de Gen6tica, Facultad de Medicina, Gral. Flores 2144, Montevideo, Uruguay. Fax (598-2) 949-563. Abbreviations: A., Azotobacter, aa, amino acid(s); Ag, antigen(s); bp, base pair(s); AT, % of adenine + thymine; GC, % of guanine + cytosine; HK, housekeeping; kb, kilobase(s) or 1000 bp; N, any nucleoside; nt, nucleotide(s); P., Plasmodium.; R, purine (A or G); RSCU, relative synonymous codon usage; S., Staphylococcus; T., Trypanosoma; Y, pyri- midine (C or T). mosomes (Kemp et al., 1987a; Wellems et al., 1987), is, therefore, an excellent model to study compositional constraints and their effects. The analysis of nt sequence data has provided useful information about the genes encoded in the nuclear genome of P. falciparum. The most relevant features are the following: (i) the coding strand is purine rich; (ii) A is predominant in all codon positions; (iii) the third codon positions are extremely AT-rich, and, as a consequence, codon usage is strongly biased (Weber, 1987; Saul and Battistutta, 1988). The most frequent dinucleotides are, as expected, those containing exclusively A and/or T, whereas the least common ones are those only composed by C and/or G. Furthermore, CG, TA and AC were lower, and TG, CC and CA were higher than the expected frequencies (Weber, 1988; Hyde and Sims, 1987). We report here an up-to-date analysis of the nuclear coding sequences from P.falciparum, which now comprise 175 kb. We found that the trends described previously with more limited sets of data (see above) are still valid. Taking advantage of the increased number of sequences which are now available, we tried to understand whether the biases already noted are species-specific or deter- SSDI 0378-1119(94100708-X