International Scholarly Research Network ISRN Bioinformatics Volume 2012, Article ID 139842, 5 pages doi:10.5402/2012/139842 Research Article Bio301: A Web-Based EST Annotation Pipeline That Facilitates Functional Comparison Studies Yen-Chen Chen, 1 Yun-Ching Chen, 2 Wen-Dar Lin, 3 Chung-Der Hsiao, 4 Hung-Wen Chiu, 5 and Jan-Ming Ho 1 1 Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan 2 Department of Biomedical Engineering, The Whitaker Biomedical Engineering Institute at Johns Hopkins University School of Medicine, 720 Rutland Avenue, Baltimore, MD 21205, USA 3 Institute of Plant and Microbial Biology, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan 4 Department of Bioscience Technology, Chung Yuan Christian University, 200 Chung Pei Road, Chung Li City 32073, Taiwan 5 Graduate Institute of Biomedical Informatics, Taipei Medical University, 250 Wu-Hsing Street, Taipei City 110, Taiwan Correspondence should be addressed to Jan-Ming Ho, hoho@iis.sinica.edu.tw Received 25 July 2011; Accepted 5 September 2011 Academic Editors: Q. Dong and A. Lukas Copyright © 2012 Yen-Chen Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In this postgenomic era, a huge volume of information derived from expressed sequence tags (ESTs) has been constructed for functional description of gene expression profiles. Comparative studies have become more and more important to researchers of biology. In order to facilitate these comparative studies, we have constructed a user-friendly EST annotation pipeline with comparison tools on an integrated EST service website, Bio301. Bio301 includes regular EST preprocessing, BLAST similarity search, gene ontology (GO) annotation, statistics reporting, a graphical GO browsing interface, and microarray probe selection tools. In addition, Bio301 is equipped with statistical library comparison functions using multiple EST libraries based on GO annotations for mining meaningful biological information. 1. Motivation Expressed sequence tags (ESTs) [1] are small pieces of DNA sequences (usually 200 to 500 nucleotides long) derived by either unidirectional or bidirectional sequencing of cDNA libraries. The information generated from ESTs has been utilized not only to identify novel gene transcripts, gene locations, and intron-exon boundaries in human and mouse genome drafts [2, 3] but also to assess gene expression levels of given tissues [4]. The large volume of information generated by the rapidly increasing number of ESTs—59 million EST entries in the dbEST in January 2009 alone—provides an excellent resource for comparative studies, so we have constructed an EST service website, Bio301, to facilitate comparative studies based on these EST data. Bio301 is equipped with not only an EST annotation pipeline but also functional comparative functionality. Bio301 has five characteristics considered to be essential for EST analysis tools that aid in functional com- parative studies: accurate preprocessing, advanced functional annotation methods, flexibility in comparing multiple EST libraries, retrieval of EST data with respect to the annotation ontology, and integrated online EST service open to the entire research community. First, Bio301 preprocesses ESTs accurately by cleaning, clustering, and assembling them. These tasks are very impor- tant because accurate preprocessing leads to accurate func- tional annotation, which is crucial for functional compar- ison studies. Bio301 uses one of the best programs for sequence cleaning, SeqClean (http://compbio.dfci.harvard .edu/tgi/software/). Concordantly, Bio301 also uses state- of-the-art programs for clustering and assembly, TGICL and CAP3 [5, 6]. Since reference genomes with extensive genome annotation have been shown to be helpful for annotation and clustering [7, 8], Bio301 also is equipped with an option for clustering ESTs wherein ESTs are mapped