Plant Molecular Biology 35: 993–1001, 1997. 993 c 1997 Kluwer Academic Publishers. Printed in Belgium. Short communication Context sequences of translation initiation codon in plants Chandrashekhar P. Joshi 1 , Hao Zhou 2 , Xiaoqiu Huang 2 and Vincent L. Chiang 1 1 Plant Biotechnology Research Center, Institute of Wood Research, School of Forestry and Wood Products ( author for correspondence) and 2 Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA Received 14 May 1997; accepted in revised form 22 July 1997 Key words: translation initiation, initiator codon, dicotyledons, monocotyledons, AUG context Abstract In this survey of 5074 plant genes for their AUG context sequences, purines are present at the 3 and 4 positions in about 80% of the sequences. Although this observation is similar to the vertebrate consensus sequence, the number of plant mRNAs with purines at the 3 position is lower and at the 4 position is higher than reported for vertebrate mRNAs. Higher plants have an AC-rich consensus sequence, caA(A/C)aAUG GCg as a context of translation initiator codon. Between the two major groups of angiosperms, the context of the AUG codon in dicot mRNAs is aaA(A/C)aAUG GCu which is similar to the higher-plant consensus but monocot mRNAs have c(a/c)(A/G)(A/C)cAUG GCG as a consensus which exhibits an overall similarity with the vertebrate consensus. The experimental evidence regarding the importance of the AUG context in plants is discussed. About ten years back, Joshi [7] proposed a consensus sequence for the context of the AUG codon in higher plants on the basis of 79 genomic sequences. That sur- vey was useful to many plant researchers in identifying the possible translation initiation codon in new genes. However, the number of plant genes that were avail- able at that time was small. Moreover, certain gene families were over-represented resulting in a skewed representation of the data. Since these genes were also analyzed for many other genomic sequence features such as TATA box, transcription start site and lead- er sequences in addition to context of the initiator AUG, cDNAs were excluded. The size of GenBank has increased many-fold in the past 10 years and genes from numerous plants and gene families have now been sequenced. Therefore, it is of interest to examine if the previous conclusions based on the limited data are still valid when an unbiased and extensive collection of both cDNAs and genomic sequences from plants is considered. The scanning mechanism of translation initiation in eukaryotes postulates that the 43S translation initiation complex, including the small subunit of the ribosome, binds to the capped 5 end of the mRNA and continues the linear scanning of the mRNA until the first AUG codon in a favorable context is found [15]. At this point, the large subunit of the ribosome joins the small subunit and translation begins. In vertebrates, five structural features of the nuclear mRNA leaders are considered important for the efficiency and/or fidelity of the trans- lation initiation at a specific AUG codon: (1) presence of a m7G cap, (2) the context of the AUG codon, (3) the proximity of AUG to 5 end, (4) the secondary structure upstream and downstream from the AUG codon, and (5) the leader sequence length [15]. In general, most higher-plant mRNAs are capped, have AU-rich leaders that reduce the potential for secondary structure form- ation, are short in length (less than 200 bp), and begin translation at the first AUG codon [7, 9]. Based on a collection of 699 vertebrate mRNAs, Kozak pro- posed (GCC)GCC(A/G)CCAUG G as the consensus sequence for the context of functional AUG codon [12]. The optimum context of an AUG codon in a vertebrate mRNA has been proved by mutational analysis [13]. However, distinct inter-taxon variations in the AUG context sequence are repeatedly observed when inver-