American Journal of Bioinformatics Research 2014, 4(1): 11-22 DOI: 10.5923/j.bioinformatics.20140401.03 Functional Characterization of Expressed Sequence Tags of Bread Wheat (Triticum aestivum) and Analysis of CRISPR Binding Sites for Targeted Genome Editing Shailesh Sharma * , Santosh Kumar Upadhyay * National Agri-Food Biotechnology Institute (Department of Biotechnology, Government of India), C-127, Industrial Area, S.A.S. Nagar, Phase 8, Mohali, Punjab, 160071, India Abstract Bread wheat (Triticum aestivum) is one of the leading food crop worldwide. However, functional characterization of wheat genome is still under progress due to its huge size (~17 Gb). We aimed to contribute in this project by functional characterization EST sequences. Wheat EST sequences (1.2 million available in the EST database) were cleaned and assembled into 27268 contigs at stringent parameters. About 89% (24339) contigs were functionally annotated using BlastX search at NCBI-NR protein database with 10 -5 e-value. The annotated contigs were further classified into Gene Ontology terms and mapped for KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway using Blast2GO program. A total of 78827 GO terms and 132 KEGG pathways were identified. Purine, and starch and sucrose metabolism were major pathways. Inositol phosphate metabolism pathway, responsible for the synthesis of phytic acid (an anti-nutritional component), was also significantly represented in wheat. We identified 3327 EST-SSRs in 2832 contigs and probable CRISPR binding sites in each contigs. Further, a hypothetical phytic acid biosynthetic pathway and possible important target genes to reduce the phytic acid content in wheat by CRISPR-Cas system has also been described. Our study provides the genetic information about an important food crop as well as method for nutritional improvement using a modern biotechnology tool. Keywords Bread wheat, EST, KEGG, GO, CRISPR, Phytic acid 1. Introduction Bread wheat (Triticum aestivum) is one of the most important food crop which accounts for ~21% food calories of 75% word population (Braun, et al. 2010). Figure is continuously increasing and it is estimated that the demand of wheat will be double by 2050. On the other hand, changes in climatic condition might decrease the production of wheat in coming years (Rosegrant et al. 2010). Introduction of new genetic and molecular biology tools for genome sequencing and genome engineering might be very useful in understanding the wheat biology and improvement in crop yield along with breeding programs (Wilson et al. 2004; Upadhyay et al 2013). Bread wheat has one of the most complicated allohexaploid and largest ~17 Gb genome, which is about 40 time of the rice genome (Arumuganathan and Earle 1991). Characterization of such kind of genome is it-shelf a big challenge; however it is an utmost need. * Corresponding author: haitoshailesh@gmail.com (Shailesh Sharma) santoshnbri@hotmail.com (Santosh Kumar Upadhyay) Published online at http://journal.sapub.org/bioinformatics Copyright © 2014 Scientific & Academic Publishing. All Rights Reserved Expressed sequence tags (EST) are very useful information about the gene sequence and their expression (Duggan et al. 1999). EST sequencing of many plant species or either completed or under way, and they are very useful in gene discovery (Ewing et al. 1999; Fulton et al. 2002; ; Hughes et al. 2004; Ronning et al. 2003; Schlueter et al. 2004). Since the sequences of genome is continuously increasing due to the decrease in sequencing cost, functional annotation and characterization has become great challenge. In case of wheat (www.wheatgenome.org), sequencing of genome is rapidly succeeding, functional characterization of wheat ESTs can be a quick and complementary approach. Further, this resource will be highly valuable in crop improvement program as well as during the annotation of wheat genome. Genome manipulation has become very important factor for crop improvement. Transcription activator like effector nucleases and Zink finger nucleases has been used for valuable mutation in plants and other organisms (Chen et al. 2013; Zhang et al. 2010). However, these technologies require protein engineering and complicated in designing. A new technology based on prokaryotic type II CRISPR-Cas9 (Clustered regularly interspaced short palindromic repeat-CRISPR associated protein) system has been reported