Int. J Comp Sci. Emerging Tech Vol-2 No 5 October, 2011
310
Biological Sequence Alignment for
Bioinformatics Applications Using MATLAB
Sonali Vijan
1
and Rajesh Mehra
2
1
Student, Electronics Engineering, NITTTR, Chandigarh
2
Faculty Members, Electronics Engineering,
#3290, Sector 35 D, Chandigarh
Email: sonali.vijan@gmail.com
Abstract: Biological Sequence alignment is widely used
operation in the field of Bioinformatics and computational
biology as it is used to determine the similarity between the
biological sequences. The two basic alignment algorithms
i.e. Smith Waterman for local alignment and Needleman
Wunsch for global alignment have been used in this paper.
The algorithms have been developed and simulated using
MATLAB for genome analysis and sequence alignment.
The local and global alignment has been presented and the
results are shown in the form of Dot plots and local and
global scores for the sequences. The proposed work is a
useful tool that can aid in the exploration, interpretation
and visualization of data in the field of molecular biology.
Keywords: Bioinformatics, Biological Sequence Alignment,
Smith-Waterman, Needleman-Wunsch, MATLAB, local
alignment, global alignment
1. Introduction
Bioinformatics is an interdisciplinary research area at
the interface between computer science and biological
science. It is a union of biology and informatics as it
involves the technology that uses computers for storage,
retrieval, manipulation and distribution of information
related to biological macromolecules such as DNA,
RNA and proteins [1]. The emphasis here is on the use
of computers because most of the tasks in genomic data
analysis are highly repetitive or mathematically
complex. Common activities in Bioinformatics include
mapping and analyzing DNA and protein sequences,
aligning different DNA and protein sequences to
compare them and creating and viewing 3-D models of
protein structures. Major research efforts in the field
includes sequence alignment, gene finding, genome
assembly, drug design, drug discovery, protein structure
alignment, protein structure prediction, genome-wise
association studies and modeling of association.
Bioinformatics has developed out of the need to
understand the code of life, DNA. Massive DNA
sequencing projects have evolved and added in the
growth of the science of bioinformatics.
Biological sequence alignment is a widely used
operation in the field bioinformatics and computational
biology. It aims to find out whether two or more
biological sequences (e.g., DNA, RNA, or Protein
sequences) are related or not. This has many important
real world applications. For instance, if some
information about one of the sequences is already
known (e.g., the sequence represents a cancerous gene)
then this information could be transferred to the other
unknown sequences, which could be vital in early
disease diagnosis and drug engineering. Other
applications include the study of evolutionary
development and the history of species and their
groupings
As individual laboratories exchange more annotated
biological data through comprehensive databases such
as NCBI’s retrieval system, Entrez, (which integrates
GenBank1), researchers have recently become
interested in detecting remote homologies by querying a
sequence of interest against a subfamily of a distant
lineage [2]. In order to unveil the structural or functional
importance of an unknown sequence, one conducts, as
an initial procedure, a sequence alignment in the
framework of the comparative computational biology. A
sequence alignment is a way of arranging the primary
sequences of DNA, RNA, or protein to identify the
regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships
between the sequences . The resulting alignment yields
an edit transcript of mismatches and indels, i.e.,
insertions and deletions, where mismatches can be
interpreted as point mutations and gaps as indels. As a
result, we can infer how sequences with the same origin
would diverge from one another.
2. DNA Alignment
Sequence comparison lies at the heart of the
bioinformatics analysis. As new biological sequences
are being generated at exponential rates, sequence
comparison is becoming increasingly important to draw
___________________________________________________________________________________
International Journal of Computer Science & Emerging Technologies
IJCSET, E-ISSN: 2044 - 6004
Copyright © ExcelingTech, Pub, UK (http://excelingtech.co.uk/)