Int. J Comp Sci. Emerging Tech Vol-2 No 5 October, 2011 310 Biological Sequence Alignment for Bioinformatics Applications Using MATLAB Sonali Vijan 1 and Rajesh Mehra 2 1 Student, Electronics Engineering, NITTTR, Chandigarh 2 Faculty Members, Electronics Engineering, #3290, Sector 35 D, Chandigarh Email: sonali.vijan@gmail.com Abstract: Biological Sequence alignment is widely used operation in the field of Bioinformatics and computational biology as it is used to determine the similarity between the biological sequences. The two basic alignment algorithms i.e. Smith Waterman for local alignment and Needleman Wunsch for global alignment have been used in this paper. The algorithms have been developed and simulated using MATLAB for genome analysis and sequence alignment. The local and global alignment has been presented and the results are shown in the form of Dot plots and local and global scores for the sequences. The proposed work is a useful tool that can aid in the exploration, interpretation and visualization of data in the field of molecular biology. Keywords: Bioinformatics, Biological Sequence Alignment, Smith-Waterman, Needleman-Wunsch, MATLAB, local alignment, global alignment 1. Introduction Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science. It is a union of biology and informatics as it involves the technology that uses computers for storage, retrieval, manipulation and distribution of information related to biological macromolecules such as DNA, RNA and proteins [1]. The emphasis here is on the use of computers because most of the tasks in genomic data analysis are highly repetitive or mathematically complex. Common activities in Bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures. Major research efforts in the field includes sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, genome-wise association studies and modeling of association. Bioinformatics has developed out of the need to understand the code of life, DNA. Massive DNA sequencing projects have evolved and added in the growth of the science of bioinformatics. Biological sequence alignment is a widely used operation in the field bioinformatics and computational biology. It aims to find out whether two or more biological sequences (e.g., DNA, RNA, or Protein sequences) are related or not. This has many important real world applications. For instance, if some information about one of the sequences is already known (e.g., the sequence represents a cancerous gene) then this information could be transferred to the other unknown sequences, which could be vital in early disease diagnosis and drug engineering. Other applications include the study of evolutionary development and the history of species and their groupings As individual laboratories exchange more annotated biological data through comprehensive databases such as NCBI’s retrieval system, Entrez, (which integrates GenBank1), researchers have recently become interested in detecting remote homologies by querying a sequence of interest against a subfamily of a distant lineage [2]. In order to unveil the structural or functional importance of an unknown sequence, one conducts, as an initial procedure, a sequence alignment in the framework of the comparative computational biology. A sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify the regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences . The resulting alignment yields an edit transcript of mismatches and indels, i.e., insertions and deletions, where mismatches can be interpreted as point mutations and gaps as indels. As a result, we can infer how sequences with the same origin would diverge from one another. 2. DNA Alignment Sequence comparison lies at the heart of the bioinformatics analysis. As new biological sequences are being generated at exponential rates, sequence comparison is becoming increasingly important to draw ___________________________________________________________________________________ International Journal of Computer Science & Emerging Technologies IJCSET, E-ISSN: 2044 - 6004 Copyright © ExcelingTech, Pub, UK (http://excelingtech.co.uk/)