599
Journal of Biomolecular Structure &
Dynamics, ISSN 0739-1102
Volume 27, Issue Number 5, (2010)
©Adenine Press (2010)
*
Phone: + 91 40 2313 4668
Fax: + 91 40 23130120
E-mail: c_mitra@yahoo.com
Padmavathi Putta
Chanchal K. Mitra*
Department of Biochemistry,
University of Hyderabad,
Hyderabad - 500 046, India
Conserved Short Sequences in Promoter Regions
of Human Genome
http://www.jbsdonline.com
Abstract
Recognition of promoter elements by the transcription factors is one of the early initial and
crucial steps in gene expression and regulation. In prokaryotes, there are clear signals to
identify the promoter regions like TATAAT at around –10 and TTGACA at –35 positions
from transcription start site (TSS). In eukaryotes the promoter regions are structurally more
complex and there are no conserved or consensus sequences similar to the ones found in
prokaryotic promoters.
We have located a set of GC rich short sequences (<8 nt) that are relatively common in
human promoter sequences around the TSS (±100 relative to TSS). These sequences were
sorted based on their frequency of occurrence in the database and the most common 50
sequences were used for further studies. Sigmoidal behavior of the high end of the frequency
distribution of these sequences suggests presence of some internal co-operativity. These
short sequences are distributed on both sides of TSS, suggesting that probably the transcrip-
tion factors recognize these sequences on both upstream and downstream of TSS. As eukary-
otic promoters lack any conserved sequences, we expect that these short sequences may help
in recognition of promoter regions by relevant transcription factors prior to the initiation of
transcription process. We postulate that a cluster of genes with common short sequences in
the promoter region can be recognized by a particular transcription factor. We also found
that most of these short sequences are fairly common within miRNA (both mature and stem-
loop sequences). Our studies indicate that eukaryotic transcription is more complex than
currently believed.
Introduction
In prokaryotes, there are clear signals within the promoter regions like TATAAT
at around -10 and TTGACA at -35 (from TSS). In eukaryotes, the genome consists
of introns, exons and promoters and other functional regions (5). The signals in the
core promoter region are often fuzzy and difficult to decipher. It is usually believed
that there is no universal or conserved core promoter sequence in eukaryotes (9).
The eukaryotic promoters are structurally more complex and therefore need to have
more complicated way for transcription.
The computational identification of promoter regions and their functional evalu-
ation is an important task in bioinformatics and computational biology. Earlier
studies have targeted the TSS as the signal for the recognition of the promoter
regions. This works well with prokaryotes but does not work well in eukaryotes,
as the signals are very weak or absent. The TSS plays a relatively minor role in the
whole transcription process in eukaryotes. In an earlier work, we have reported
on the collective behavior of the promoter sequences using information content
(4). This approach is useful only to locate gross features but lacks any key details.