Gene, 152 (1995) 127 132
© 1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$09.50 127
GENE 08497
Compositional properties of nuclear genes from Plasmodiumfalciparum*
(Malaria; parasites; housekeeping genes; antigen genes; Staphylococcus)
Hector Musto**, Helena Rodriguez-Maseda** and Giorgio Bernardi
Laboratoire de G~n~tique Mol~culaire, lnstitut Jacques Monod, 75005 Paris, France
Received by L. Pereira da Silva: 31 July 1994; Revised/Accepted: 31 August 1994; Received at publishers: 10 October 1994
SUMMARY
We have analyzed the compositional distributions of coding sequences and their different codon positions, as well as
the codon usage of the nuclear genes of Plasmodium falciparum, a parasite characterized by an extremely GC-poor
genome. As expected, coding sequences are AT-rich, codon usage is strongly biased towards A or T in third codon
positions, and some particular amino acids (aa) are especially abundant in the encoded proteins. Remarkably, however,
no difference was detected between housekeeping (HK) and antigen (Ag) genes, in spite of differences in expression level
and evolutionary constraints. Moreover, all the features found in P. falciparum are very similar to those found in a
bacterium characterized by a very GC-poor genome, Staphylococcus aureus. These findings stress the importance of
compositional constraints in determining codon usage and aa utilisation.
INTRODUCTION
A striking feature of Plasmodiumfalciparum, a unicellu-
lar parasite responsible for the most virulent and wide-
spread form of human malaria, is that it hosts the
GC-poorest nuclear genome known so far (Pollack et al.,
1982; McCutchan et al., 1984). This genome, which only
comprises 3 x 107 bp (Weber, 1988) organized in 14 chro-
Correspondence to: Dr. G. Bernardi, Laboratoire de G6n6tique
Mot6culaire, lnstitut Jacques Monod, 2 Place Jussieu, 75005 Paris,
France. Tel. (33-1 ) 4329-5824; Fax (33-1 ) 4427-7977;
e-mail: Bernardi@citi2.fr
*Presented at the UNESCO-WHO Meeting on Combatting Malaria,
Paris, France, 19 21 January 1994.
**Permanent address: (H.M.) Departamento de Bioquimica, Facultad
de Ciencias, Tristan Narvaja 1674, Montevideo 11200, Uruguay. Fax
(598-2) 409-973; (H.M, and H..R.-M.) Departamento de Gen6tica,
Facultad de Medicina, Gral. Flores 2144, Montevideo, Uruguay. Fax
(598-2) 949-563.
Abbreviations: A., Azotobacter, aa, amino acid(s); Ag, antigen(s); bp,
base pair(s); AT, % of adenine + thymine; GC, % of guanine + cytosine;
HK, housekeeping; kb, kilobase(s) or 1000 bp; N, any nucleoside; nt,
nucleotide(s); P., Plasmodium.; R, purine (A or G); RSCU, relative
synonymous codon usage; S., Staphylococcus; T., Trypanosoma; Y, pyri-
midine (C or T).
mosomes (Kemp et al., 1987a; Wellems et al., 1987),
is, therefore, an excellent model to study compositional
constraints and their effects.
The analysis of nt sequence data has provided useful
information about the genes encoded in the nuclear
genome of P. falciparum. The most relevant features are
the following: (i) the coding strand is purine rich; (ii) A
is predominant in all codon positions; (iii) the third codon
positions are extremely AT-rich, and, as a consequence,
codon usage is strongly biased (Weber, 1987; Saul and
Battistutta, 1988). The most frequent dinucleotides are,
as expected, those containing exclusively A and/or T,
whereas the least common ones are those only composed
by C and/or G. Furthermore, CG, TA and AC were lower,
and TG, CC and CA were higher than the expected
frequencies (Weber, 1988; Hyde and Sims, 1987).
We report here an up-to-date analysis of the nuclear
coding sequences from P.falciparum, which now comprise
175 kb. We found that the trends described previously
with more limited sets of data (see above) are still valid.
Taking advantage of the increased number of sequences
which are now available, we tried to understand whether
the biases already noted are species-specific or deter-
SSDI 0378-1119(94100708-X