Journal of Biomedical Engineering and Technology, 2013, Vol. 1, No. 2, 26-30 Available online at http://pubs.sciepub.com/jbet/1/2/2 © Science and Education Publishing DOI:10.12691/jbet-1-2-2 Mining Quantitative Association Rules in HIV Protein Sequences Anubha Dubey 1,* , Usha Chouhan 2 1 Department of Bioinformatics, Manit, Bhopal (M.P), India 2 Department of Mathematics, Manit, Bhopal (M.P), India *Corresponding author: anubhadubey@rediffmail.com Received July 11, 2013; Revised August 02, 2013; Accepted August 05, 2013 Abstract Lot of research has gone into understanding the composition and nature of proteins, still many things remain to be understood satisfactorily. It is now generally believed that amino acid sequences of proteins are not random, and thus the patterns of amino acids that we observe in the protein sequences are also non-random. In this study, we have attempted to decipher the nature of associations between different amino acids that are present in a HIV protein. This very basic analysis provides insights into the co-occurrence of certain amino acids in a HIV protein. Such association rules are desirable for enhancing our understanding of protein composition and hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in proteins. The aim of association rules mining is to reveal underlying interactions in large sets of data items. Knowledge of these rules or constraints is highly desirable for the in-vitro synthesis of artificial proteins. This will also give new insights to understand protein-protein interactions in HIV. Keywords: data mining, quantitative association rule mining, protein composition. Cite This Article: Dubey, Anubha, and Usha Chouhan, “Mining Quantitative Association Rules in HIV Protein Sequences.” Journal of Biomedical Engineering and Technology 1, no. 2 (2013): 26-30. doi: 10.12691/jbet-1-2-2. 1. Introduction Proteins are important constituent of cellular machinery of any organism. Recombinant DNA Technologies have provided tools for the rapid determination of DNA sequences and, by inference, the amino acid sequences of proteins from structural genes [1]. The proteins are sequences made up of 20 types of amino acids. Each amino acid is represented by a single letter alphabet, as given in Table 1. Each protein adopts a unique 3- dimensional structure, which is decided completely by its amino acid sequence. A slight change in the sequence might completely change the functioning of the protein. Just as the letters of the alphabet can be combined to form an almost endless variety of words, amino acids can be linked together in varying sequences to form a vast variety of proteins [13]. Table 1. Single letter codes of amino acids S.No. AA code Full name Side chain polarity Side chain charge Hydropathy Index 1. A Alanine nonpolar neutral 1.8 2. C Cysteine nonpolar neutral 2.5 3. D Aspartic acid polar negative -3.5 4 E Glutamic Acid polar negative -3.5 5 F Phenylalanine nonpolar neutral 1.9 6 G Glycine nonpolar neutral -0.4 7 H Histidine polar positive -3.2 8 I Isoleucine nonpolar neutral 4.5 9 K Lysine polar positive -3.9 10 L Leucine Non-polar neutral 3.8 11 M Methionine nonpolar neutral 1.9 12 N Asparagine polar neutral -3.5 13 P Proline Non-polar neutral -1.6 14 Q Glutamine polar neutral -3.6 15 R Arginine polar positive -3.5 16 S Serine polar neutral -0.8 17 T Threonine polar neutral -0.8 18 V Valine Non-polar neutral 4.2 19 W Tryptophan Non-polar neutral -0.9 20 Y Tyrosine polar neutral -1.3