Open Access Research Article Sohpal et al,. J Data Mining Genomics Proteomics 2013, S3 DOI: 10.4172/2153-0602.S3-001 J Data Mining Genomics Proteomics Special Issue on Genome Annotation ISSN: 2153-0602 JDMGP, an open access journal Keywords: Sequence similarity; Substitution matrix; Triplex capsid protein; Human Herpes Simplex Virus (HHV); ANFIS Introduction Sequence alignment is the most fundamental technique in bioinformatics for establishing evolutionary relationship between diferent bio-molecules and biological species. Sequence comparison also ofers bases for medical diagnosis and drug development. Tere are many computational models and approaches that can be applied to sequence alignment. Tese models can be classifed on the basis of algorithms and alignment techniques. Dynamic programming methods using N-W and S-W approach is prominent in pairwise sequence alignment. Sequence alignment systems and variables (gap penalty, extension and substitution matrices) are intrinsically fuzzy, as their properties and behaviors contain uncertainty. Fuzzy logic and modeling are ideal to describe sequence alignment and provide robust tools for optimization of variables. In addition, it has been shown that exact or optimal solutions have signifcant limitations in the sequence alignment problems. Te applications of fuzzy concepts and approaches have been also growing in the sequence alignment and phylogenetic analysis to overcome the randomness. Zhang et al. [1] evaluated several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm, which uses a Pearson correlation in its distance metrics. Tey observed that newly developed measure could be used to assess the validity of fuzzy clusters produced by correlation-based fuzzy c-means clustering algorithm. Garcia et al. [2] proposed FISim, a similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides, with respect to the information content of the positions. FISim provides excellent results when dealing with sets of randomly generated motifs. Mansoori et al. [3] proposed a fuzzy rule-based classifer for assigning amino acid sequences into diferent super-families of proteins. Te obtained results show that the generated fuzzy rules are more interpretable, with acceptable improvement in the classifcation accuracy. Bidargaddi et al. [4] proposed a fuzzy profle HMM to overcome the limitations, and to achieve an improved alignment for protein sequences belonging to a given family. Te strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profles of a given family. Espadaler et al. [5] introduced a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function, if they share similar interacting partners. Tey observed this method could increase 10% the specifcity of genome wide enzyme predictions based on sequence matching by PSI-BLAST alone. Gomez et al. [6] described a new method that predicts the putative function for the protein, integrating the results from the PSI-BLAST program and a fuzzy logic algorithm. Collyda et al. [7] deal with phylogenetic analysis of protein and gene data, using multiple sequence alignments produced by fuzzy profle Hidden Markov Models. Te results of the analysis are compared against those obtained by the classical profle HMM model, and depict the superiority of the fuzzy profle HMM in this feld. Brylinski et al. [8] described a computational model that can be used to identify potential areas that are able to interact with other molecules (ligand, substrates and inhibitors). Samsonova et al. [9] proposed a rule-based characterization of olfactory receptors derived from a multiple sequence alignment of human GPCRs. Tey concluded that seven alignment sites are sufcient to characterize 99% of human olfactory GPCRs. Huang et al. [10] proposed an efcient nonparametric classifer for predicting enzyme subfamily class, using an adaptive fuzzy r-nearest neighbor (AFK-NN) method, where k and a fuzzy strength parameter m are adaptively specifed. Te accuracy of AFK-NN on the *Corresponding author: Vipan Kumar Sohpal, Department of Chemical and Bio Technology, Beant College of Engineering and Tech, Gurdaspur, Punjab, India, E-mail: vipan752002@gmail.com Received March 04, 2013; Accepted March 27, 2013; Published April 04, 2013 Citation: Vipan Kumar S, Dey A, Singh A (2013) N-W Algorithm and ANFIS Modeling on Alignment Similarity of Triplex Capsid Protein of Human Herpes Simplex Virus. J Data Mining Genomics Proteomics S3: 001. doi:10.4172/2153- 0602.S3-001 Copyright: © 2013 Vipan Kumar S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. N-W Algorithm and ANFIS Modeling on Alignment Similarity of Triplex Capsid Protein of Human Herpes Simplex Virus Vipan Kumar Sohpal 1 *, Apurba Dey 2 and Amarpal Singh 3 1 Department of Chemical and Bio Technology, Beant College of Engineering and Tech, Gurdaspur, Punjab, India 2 Department of Bio Technology, National Institute of Technology, Durgapur, West Bengal, India 3 Department of Electronics and Communication Engg, Beant College of Engineering and Tech, Gurdaspur, Punjab, India Abstract Optimal sequence similarity of triplex capsid proteins of human herpes simplex virus (HHV) is a complex bioinformatics problem, which is controlled by alignment algorithms, substitution matrix, gap penalty and gap extension. A precise choice of mutation matrix is required to optimal the alignment similarity and appropriate computational approach required for similarity search. The present paper uses Adaptive Neuro-Fuzzy Inference System (ANFIS) approach to model and simulate the alignment similarity for PAM and Blosum substitution matrices. Mutation matrix and sequences of HHV-I and HHV-II were taken as model’s input parameters. The model is the combination of fuzzy inference, artifcial neural network, and set of fuzzy rules has been developed directly from computational analysis using N-W algorithm. The proposed modeling approach is verifed by comparing the expected results with the observed practical results obtained by computational analysis under specifc conditions. The application of ANFIS test shows that the substitution matrix predicted by a proposed model is fully in agreement with the experimental values at 0.5% level of signifcance. Journal of Data Mining in Genomics & Proteomics J o u r n a l o f D a t a M i n i n g i n G e n o m i c s & P r o t e o m i c s ISSN: 2153-0602