International Scholarly Research Network
ISRN Bioinformatics
Volume 2012, Article ID 139842, 5 pages
doi:10.5402/2012/139842
Research Article
Bio301: A Web-Based EST Annotation Pipeline That Facilitates
Functional Comparison Studies
Yen-Chen Chen,
1
Yun-Ching Chen,
2
Wen-Dar Lin,
3
Chung-Der Hsiao,
4
Hung-Wen Chiu,
5
and Jan-Ming Ho
1
1
Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
2
Department of Biomedical Engineering, The Whitaker Biomedical Engineering Institute at Johns Hopkins University School of
Medicine, 720 Rutland Avenue, Baltimore, MD 21205, USA
3
Institute of Plant and Microbial Biology, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
4
Department of Bioscience Technology, Chung Yuan Christian University, 200 Chung Pei Road, Chung Li City 32073, Taiwan
5
Graduate Institute of Biomedical Informatics, Taipei Medical University, 250 Wu-Hsing Street, Taipei City 110, Taiwan
Correspondence should be addressed to Jan-Ming Ho, hoho@iis.sinica.edu.tw
Received 25 July 2011; Accepted 5 September 2011
Academic Editors: Q. Dong and A. Lukas
Copyright © 2012 Yen-Chen Chen et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
In this postgenomic era, a huge volume of information derived from expressed sequence tags (ESTs) has been constructed for
functional description of gene expression profiles. Comparative studies have become more and more important to researchers
of biology. In order to facilitate these comparative studies, we have constructed a user-friendly EST annotation pipeline with
comparison tools on an integrated EST service website, Bio301. Bio301 includes regular EST preprocessing, BLAST similarity
search, gene ontology (GO) annotation, statistics reporting, a graphical GO browsing interface, and microarray probe selection
tools. In addition, Bio301 is equipped with statistical library comparison functions using multiple EST libraries based on GO
annotations for mining meaningful biological information.
1. Motivation
Expressed sequence tags (ESTs) [1] are small pieces of DNA
sequences (usually 200 to 500 nucleotides long) derived by
either unidirectional or bidirectional sequencing of cDNA
libraries. The information generated from ESTs has been
utilized not only to identify novel gene transcripts, gene
locations, and intron-exon boundaries in human and mouse
genome drafts [2, 3] but also to assess gene expression levels
of given tissues [4].
The large volume of information generated by the rapidly
increasing number of ESTs—59 million EST entries in
the dbEST in January 2009 alone—provides an excellent
resource for comparative studies, so we have constructed an
EST service website, Bio301, to facilitate comparative studies
based on these EST data. Bio301 is equipped with not only
an EST annotation pipeline but also functional comparative
functionality. Bio301 has five characteristics considered to be
essential for EST analysis tools that aid in functional com-
parative studies: accurate preprocessing, advanced functional
annotation methods, flexibility in comparing multiple EST
libraries, retrieval of EST data with respect to the annotation
ontology, and integrated online EST service open to the
entire research community.
First, Bio301 preprocesses ESTs accurately by cleaning,
clustering, and assembling them. These tasks are very impor-
tant because accurate preprocessing leads to accurate func-
tional annotation, which is crucial for functional compar-
ison studies. Bio301 uses one of the best programs for
sequence cleaning, SeqClean (http://compbio.dfci.harvard
.edu/tgi/software/). Concordantly, Bio301 also uses state-
of-the-art programs for clustering and assembly, TGICL
and CAP3 [5, 6]. Since reference genomes with extensive
genome annotation have been shown to be helpful for
annotation and clustering [7, 8], Bio301 also is equipped
with an option for clustering ESTs wherein ESTs are mapped