Applying Hidden Markov Model to Protein Sequence Alignment Er. Neeshu Sharma #1 , Er. Dinesh Kumar *2 , Er. Reet Kamal Kaur #3 # CSE, PTU #1 RIMT-MAEC , #3 RIMT-MAEC CSE, PTU DAVIET, Jallandhar Abstract----Hidden markov models is a statistical tool largely used to study protein alignments and profile analysis of a set of proteins. Finite state machines like HMM move through a series of states and produce output either when the machine has reached a particular state or when it is moving from state to another. It generates a protein sequence by emitting amino acids as it progresses through a series of states. Multiple sequence alignment is a powerful technique that is used by modern bioinformatics systems almost in all their applications. The biomedical methods and algorithms used in MSA have vast importance in solving a series of related biological problems. The well-known and widely used statistical method of characterizing the spectral properties of the residues of a genomic or proteomic pattern is the HMM approach. Profile HMMs have proved to offer a robust solution for MSA. Keywords--- HiddenMarkovModel HMM, Multiple Sequence Alignment PairwiseSequence Alignment, Alignment,Profile HMM 1. INTRODUCTION Sequence alignment is a way of writing one sequence on top of another where the residues in one position are supposed to have a common evolutionary origin. If the same letter occurs in both sequences then this position has been conserved in evolution. If the letters differ it is assumed that the two derive from an ancestral letter. Similar sequences may have different length, which is generally explained through insertions or deletions in sequences. Thus, a letter or a stretch of letters may be paired up with dashes in the other sequence to signify such an insertion or deletion. Since an insertion in one sequence can always be seen as a deletion in the other one frequently uses the term "indel" to represent this. There are two main areas of sequence alignment: pairwise sequence alignment and multiple sequence alignment: 1.1. Pairwise Sequence Alignment Pairwise Sequence alignment: This alignment is an arrangement of two DNA & amino acid which shows where the two sequences are similar, and where they differ. Broadly, there are three categories of methods for sequence comparison. Segment methods compare all windows (overlapping segments of a predetermined length (e.g., 10 amino acids)) from one sequence to all segments from the other. This is the approach used in dot plots. [18] Optimal global alignment methods allow the best overall score for the comparison of the two sequences to be obtained, including a consideration of gaps. Global: All positions are aligned CA--GATTCGAAT! CGCCGATT---AT! Optimal local alignment algorithms seek to identify the best local similarities between two sequences but, unlike segment methods, include explicit consideration of gaps. Local: A (contiguous) subset of positions are aligned ..GATT.....! ....GATT.. ![18]. Based on differences between the two sequences, one can calculate the "cost" of aligning the two sequences by using replacements, deletions and insertions, and assign a similarity score. [18] 1.2 Multiple Sequence Alignment: Multiple sequence alignment [18] aims to find similarities between many sequences where all similar sequences can be compared in one single figure or table. The basic idea is that the sequences are aligned on top of each other, so that a coordinate system is set up, where each row is the sequence for one protein, and each column is the 'same' position in each Er. Neeshu Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3) , 2011, 1031-1035 1031