IJSRSET1622104 | Received: 20 March 2016 | Accepted: 264 March 2016 | March-April 2016 [(2)2 337-343]
© 2016 IJSRSET | Volume 2 | Issue 2 | Print ISSN : 2395-1990 | Online ISSN : 2394-4099
Themed Section: Science and Technology
337
Computational Techniques for the Functional Annotation of
Hypothetical ORFS in Human Chromosome 3
Sivashankari Selvarajan
1
, Piramanayagam Shanmughavel
2
1
Assistant Professor in UGC, Innovative Programme, Department of Bioinformatics, Nirmala College for Women, Coimbatore,
Tamil Nadu, India
2
Associate Professor, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
ABSTRACT
In biochemistry, a hypothetical protein encoded by a hypothetical gene is a protein whose existence has been
predicted, for which there is no experimental evidence for expression in vivo. As a result, the function of such genes
is not known. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced
genomes and their functions are known. The rest 40% of the genes in any genome is totally unknown in terms of its
functions. As of September 2010, there are around 637 genes encoded as Hypothetical in NCBI. So, the present
investigation focused on functional annotation of hypothetical genes in the Chromosome 3 of the Human Genome.
Keywords: Annotation, Chromosome3, Function, Hypothetical Genes
I. INTRODUCTION
The human genome project revealed the three billion
base pairs encrypted within the twenty three pairs of
chromosomes in the human genome. Also, the Human
Genome contains 30,000 genes, constituting just 1% of
the ~3 billion base pairs of the total human DNA.
Among these, there are genes (called Hypothetical ORFs)
which code for the so-called “hypothetical proteins”
whose existence is either validated experimentally or
predicted computationally but its function is not yet
reported. Hence, after the completion of the genome
sequences, the challenge ahead for all biologists is to use
the data to interpret the function of the protein, the cell,
and the organism. This can be achieved by a process
called annotation which involves identification of genes
within the chromosome, its fine structure, determination
of protein products encodes by the gene and
understanding the function (Venter et al., 2001). A
group of these genes may be involved in many
pathological disorders and hence are of pharmaceutical
significance. Thus, annotation is an essential process of
understanding the entire mechanism behind the cellular
processes and molecular functions of a genome.
However, there were inconsistencies in the accuracy of
genome annotation in the initial stages which are now
gone due to advancements in computational algorithms
and potentiality of bioinformatics. After annotation of
the Human Genome a number of genes (59%) reported
by the project were hypothetical and annotated genes
with unknown function (Venter et al 2001) (Table 1).
Table 1.1 The Human Genome Statistics
S.No Topic Statistic
1 Total size of the genome approximately
3,200,000,000 bp
2 Percentage of DNA
spanned by genes
between 25% and
38%
3 Percentage of exons 1.1 to 1.4%
4 Percentage of introns 24% to 37%
5 Occurrence rate of
genes
about 12 per
1,000,000 bp
6 Percent of hypothetical
genes and annotated
genes with unknown
function in the genome
59%
Source: Venter et al., 2001