Open Access Research Article
Sohpal et al,. J Data Mining Genomics Proteomics 2013, S3
DOI: 10.4172/2153-0602.S3-001
J Data Mining Genomics Proteomics Special Issue on Genome Annotation ISSN: 2153-0602 JDMGP, an open access journal
Keywords: Sequence similarity; Substitution matrix; Triplex capsid
protein; Human Herpes Simplex Virus (HHV); ANFIS
Introduction
Sequence alignment is the most fundamental technique in
bioinformatics for establishing evolutionary relationship between
diferent bio-molecules and biological species. Sequence comparison
also ofers bases for medical diagnosis and drug development. Tere
are many computational models and approaches that can be applied
to sequence alignment. Tese models can be classifed on the basis
of algorithms and alignment techniques. Dynamic programming
methods using N-W and S-W approach is prominent in pairwise
sequence alignment. Sequence alignment systems and variables (gap
penalty, extension and substitution matrices) are intrinsically fuzzy,
as their properties and behaviors contain uncertainty. Fuzzy logic
and modeling are ideal to describe sequence alignment and provide
robust tools for optimization of variables. In addition, it has been
shown that exact or optimal solutions have signifcant limitations in
the sequence alignment problems. Te applications of fuzzy concepts
and approaches have been also growing in the sequence alignment and
phylogenetic analysis to overcome the randomness.
Zhang et al. [1] evaluated several validity measures in fuzzy clustering
and develop a new measure for a fuzzy c-means algorithm, which uses
a Pearson correlation in its distance metrics. Tey observed that newly
developed measure could be used to assess the validity of fuzzy clusters
produced by correlation-based fuzzy c-means clustering algorithm.
Garcia et al. [2] proposed FISim, a similarity measure between PFMs,
based on the fuzzy integral of the distance of the nucleotides, with
respect to the information content of the positions. FISim provides
excellent results when dealing with sets of randomly generated
motifs. Mansoori et al. [3] proposed a fuzzy rule-based classifer
for assigning amino acid sequences into diferent super-families of
proteins. Te obtained results show that the generated fuzzy rules are
more interpretable, with acceptable improvement in the classifcation
accuracy. Bidargaddi et al. [4] proposed a fuzzy profle HMM to
overcome the limitations, and to achieve an improved alignment for
protein sequences belonging to a given family. Te strong correlations
and the sequence preference involved in the protein structures make
this fuzzy architecture based model as a suitable candidate for building
profles of a given family. Espadaler et al. [5] introduced a computational
approach for annotation of enzymes, based on the observation
that similar protein sequences are more likely to perform the same
function, if they share similar interacting partners. Tey observed this
method could increase 10% the specifcity of genome wide enzyme
predictions based on sequence matching by PSI-BLAST alone. Gomez
et al. [6] described a new method that predicts the putative function
for the protein, integrating the results from the PSI-BLAST program
and a fuzzy logic algorithm. Collyda et al. [7] deal with phylogenetic
analysis of protein and gene data, using multiple sequence alignments
produced by fuzzy profle Hidden Markov Models. Te results of the
analysis are compared against those obtained by the classical profle
HMM model, and depict the superiority of the fuzzy profle HMM in
this feld. Brylinski et al. [8] described a computational model that can
be used to identify potential areas that are able to interact with other
molecules (ligand, substrates and inhibitors). Samsonova et al. [9]
proposed a rule-based characterization of olfactory receptors derived
from a multiple sequence alignment of human GPCRs. Tey concluded
that seven alignment sites are sufcient to characterize 99% of human
olfactory GPCRs. Huang et al. [10] proposed an efcient nonparametric
classifer for predicting enzyme subfamily class, using an adaptive fuzzy
r-nearest neighbor (AFK-NN) method, where k and a fuzzy strength
parameter m are adaptively specifed. Te accuracy of AFK-NN on the
*Corresponding author: Vipan Kumar Sohpal, Department of Chemical and Bio
Technology, Beant College of Engineering and Tech, Gurdaspur, Punjab, India,
E-mail: vipan752002@gmail.com
Received March 04, 2013; Accepted March 27, 2013; Published April 04, 2013
Citation: Vipan Kumar S, Dey A, Singh A (2013) N-W Algorithm and ANFIS
Modeling on Alignment Similarity of Triplex Capsid Protein of Human Herpes
Simplex Virus. J Data Mining Genomics Proteomics S3: 001. doi:10.4172/2153-
0602.S3-001
Copyright: © 2013 Vipan Kumar S, et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
N-W Algorithm and ANFIS Modeling on Alignment Similarity of Triplex
Capsid Protein of Human Herpes Simplex Virus
Vipan Kumar Sohpal
1
*, Apurba Dey
2
and Amarpal Singh
3
1
Department of Chemical and Bio Technology, Beant College of Engineering and Tech, Gurdaspur, Punjab, India
2
Department of Bio Technology, National Institute of Technology, Durgapur, West Bengal, India
3
Department of Electronics and Communication Engg, Beant College of Engineering and Tech, Gurdaspur, Punjab, India
Abstract
Optimal sequence similarity of triplex capsid proteins of human herpes simplex virus (HHV) is a complex
bioinformatics problem, which is controlled by alignment algorithms, substitution matrix, gap penalty and gap extension.
A precise choice of mutation matrix is required to optimal the alignment similarity and appropriate computational
approach required for similarity search. The present paper uses Adaptive Neuro-Fuzzy Inference System (ANFIS)
approach to model and simulate the alignment similarity for PAM and Blosum substitution matrices. Mutation matrix
and sequences of HHV-I and HHV-II were taken as model’s input parameters. The model is the combination of fuzzy
inference, artifcial neural network, and set of fuzzy rules has been developed directly from computational analysis
using N-W algorithm. The proposed modeling approach is verifed by comparing the expected results with the observed
practical results obtained by computational analysis under specifc conditions. The application of ANFIS test shows that
the substitution matrix predicted by a proposed model is fully in agreement with the experimental values at 0.5% level
of signifcance.
Journal of
Data Mining in Genomics & Proteomics
J
o
u
r
n
a
l
o
f
D
a
t
a
M
i
n
i
n
g
i
n
G
e
n
o
m
i
c
s
&
P
r
o
t
e
o
m
i
c
s
ISSN: 2153-0602