Evolution of tRNA-like sequences and genome variability Felix E. Frenkel * , Maria B. Chaley, Eugene V. Korotkov, Konstantin G. Skryabin Centre ‘‘Bioengineering’’ RAS, Prospekt 60-letiya Oktyabrya, 7/1, Moscow 117312, Russia Received 31 October 2003; received in revised form 16 February 2004; accepted 5 March 2004 Available online 5 May 2004 Received by T. Gojobori Abstract Transfer RNA (tRNA)-like sequences were searched for in the nine basic taxonomic divisions of GenBank-121 (viruses, phages, bacteria, plants, invertebrates, vertebrates, rodents, mammals, and primates) by an original program package implementing a dynamic profile alignment approach for the genetic texts’ analysis, in using 22 profiles of tRNAs of different isotypes. In total, 175,901 previously unknown tRNA-like sequences were revealed. The locations of the tRNA-likes were considered over the regions whose functional meaning is described by standard Feature Keys in GenBank. Many regions containing the tRNA-like sequences were recognized as known repeats. A mode of distribution of the tRNA-like sequences in a genome was proposed as expansion in a content of the various transposable elements. An analysis of the integrity of RNA polymerase III inner promoters in the tRNA-like sequences over the GenBank divisions has shown a high possibility of generating new copies of short interspersed nuclear element (SINE) repeats in all divisions, excepting primates. The numerous tRNA-likes found in the regions of RNA polymerase II promoters have suggested an adaptation of RNA polymerase III promoter to a binding of RNA polymerase II. D 2004 Elsevier B.V. All rights reserved. Keywords: Computer analysis; Genome evolution; tRNAs; Transposable element 1. Introduction Transfer RNAs (tRNAs) along with the various RNAs of ribonucleoprotein complexes (in the first place ribo- somal RNAs) by right can be considered as one of the most ancient genome sequences, originating as far back as RNA World (Jeffares et al., 1998). Exact and relatively simple spatial structure of tRNAs, their capability for being recognized by special enzymes such as aminoacyl- tRNA-synthetases (aaRSs), RNAses P, various modifying enzymes, factors of biosynthesis, and also that tRNA genes (tDNAs) contain inner promoter of RNA polymerase III—all of these have made tRNAs attractive for involve- ment in other functional processes in the genome. For example, for retroviruses, the tRNAs of the cell producers serve as primers of retrotranscriptases (Cen et al., 2002). Besides this, the retrotranscription of the retrovirus could be primed by endogenous tRNAs of a new infected cell (Schmitz et al., 2002). There were probably a few molecular variants of structures, tRNA prototypes, and only one has been selected by evolution as the genetic code adaptor (Giege et al., 1998a). All these variants were recognized by specific enzymes due to a small number of nucleotides, the so-called ‘‘identity elements’’, being embedded in their structures. An example of such identity elements include single bases and base pairs of ‘‘operation code’’ (Schimmel 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2004.03.005 Abbreviations: aaRS, aminoacyl-tRNA synthetase; BCT, bacteria; CDS, protein-coding sequence; D-loop, displacement loop; Gb, gigabase; INV, invertebrates; LINE, long interspersed nuclear element; LTR, long terminal repeat; MAM, mammals; mat _ peptide, coding sequence for the mature peptide; MER, medium reiterated repeat; MIR, mammalian-wide inter- spersed repeat; misc _ feature, region of biological interest which cannot be described by any other Feature Keys; misc _ RNA, any transcript or RNA product that cannot be defined by other RNA keys (mRNA, tRNA, scRNA, snRNA, and others); MPI, message passing interface; mRNA, messenger RNA; PHG, phages; PLN, plants; precursor _ RNA, any RNA species that is not yet the mature RNA product; PRI, primates; prim _ transcript, primary transcript; protein _ bind, protein binding site; RNase, ribonuclease; ROD, rodents; rRNA, mature ribosomal RNA; scRNA, small cytoplasmic RNA; sig _ peptide, signal peptide coding sequence; SINE, short interspersed nuclear element; snRNA, small nuclear RNA; STS, sequence tagged site; tDNA, tRNA gene; tRNA, transfer RNA; URL, uniform resource locator; UTR, untranslated region; V _ region, variable segment of immunoglobulin light and heavy chains; VRL, viruses; VRT, vertebrates. * Corresponding author. Tel.: +7-95-135-2161; fax: +7-95-135-0571. E-mail address: felix@biengi.ac.ru (F.E. Frenkel). www.elsevier.com/locate/gene Gene 335 (2004) 57 – 71