599 Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 27, Issue Number 5, (2010) ©Adenine Press (2010) * Phone: + 91 40 2313 4668 Fax: + 91 40 23130120 E-mail: c_mitra@yahoo.com Padmavathi Putta Chanchal K. Mitra* Department of Biochemistry, University of Hyderabad, Hyderabad - 500 046, India Conserved Short Sequences in Promoter Regions of Human Genome http://www.jbsdonline.com Abstract Recognition of promoter elements by the transcription factors is one of the early initial and crucial steps in gene expression and regulation. In prokaryotes, there are clear signals to identify the promoter regions like TATAAT at around –10 and TTGACA at –35 positions from transcription start site (TSS). In eukaryotes the promoter regions are structurally more complex and there are no conserved or consensus sequences similar to the ones found in prokaryotic promoters. We have located a set of GC rich short sequences (<8 nt) that are relatively common in human promoter sequences around the TSS (±100 relative to TSS). These sequences were sorted based on their frequency of occurrence in the database and the most common 50 sequences were used for further studies. Sigmoidal behavior of the high end of the frequency distribution of these sequences suggests presence of some internal co-operativity. These short sequences are distributed on both sides of TSS, suggesting that probably the transcrip- tion factors recognize these sequences on both upstream and downstream of TSS. As eukary- otic promoters lack any conserved sequences, we expect that these short sequences may help in recognition of promoter regions by relevant transcription factors prior to the initiation of transcription process. We postulate that a cluster of genes with common short sequences in the promoter region can be recognized by a particular transcription factor. We also found that most of these short sequences are fairly common within miRNA (both mature and stem- loop sequences). Our studies indicate that eukaryotic transcription is more complex than currently believed. Introduction In prokaryotes, there are clear signals within the promoter regions like TATAAT at around -10 and TTGACA at -35 (from TSS). In eukaryotes, the genome consists of introns, exons and promoters and other functional regions (5). The signals in the core promoter region are often fuzzy and difficult to decipher. It is usually believed that there is no universal or conserved core promoter sequence in eukaryotes (9). The eukaryotic promoters are structurally more complex and therefore need to have more complicated way for transcription. The computational identification of promoter regions and their functional evalu- ation is an important task in bioinformatics and computational biology. Earlier studies have targeted the TSS as the signal for the recognition of the promoter regions. This works well with prokaryotes but does not work well in eukaryotes, as the signals are very weak or absent. The TSS plays a relatively minor role in the whole transcription process in eukaryotes. In an earlier work, we have reported on the collective behavior of the promoter sequences using information content (4). This approach is useful only to locate gross features but lacks any key details.