Data Mining of Supersecondary Structure Homology between Light Chains of Immunogloblins and MHC Molecules: Absence of the Common Conformational Fragment in the Human IgM Rheumatoid Factor Hiroshi Izumi,* , Akihiro, Wakisaka, Laurence A. Nae, ,§ and Rina K. Dukor § National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba West, 16-1 Onogawa, Tsukuba, Ibaraki 305-8569, Japan Department of Chemistry, Syracuse University, Syracuse, New York 13244-4100, United States § BioTools, Inc., 17546 SR 710 (Bee Line Hwy) Jupiter, Florida 33458, United States * S Supporting Information ABSTRACT: It is shown that fuzzy search and data mining techniques of supersecondary structure homology for subunits of proteins using conformational code patterns of α-helix-type (3β5α4β) and β-sheet-type (6α4β4β) fragments can be used to extract correlations between fragments of MHC class I molecules and the light chain of immunoglobulins. The new method of conformational pattern analysis with fuzzy search of structural code homology reects well the shape of main chain rather than secondary structure in comparison with the DSSP method. Further, the data mining technique using the combination of h- and s-fragment patterns can quantify the supersecondary structure homology between any subunits of proteins with dierent amino acid sequences. Characteristic fragment patterns (string shhshss), which were sandwiched between two identical amino acid sequences His and Pro, were found in light chains of various types of immunogloblins, α-chain and β-2 microglobulin of MHC class I and α-chain and β-chain of MHC class II, but not in heavy chains of Fab immunoglobulin fragments and T cell receptors (TCR). Leukocyte immunoglobulin-like receptors (LILR) are related by the conformational fragment (string shhshss) to β-2 microglobulins as a type of pair forms (string sohsss). Further, human IgM rheumatoid factor, one of the immunogloblins, did not strongly exhibit the conformational fragment pattern. Nonclassic MHC class I molecules CD1D, MIC-A, and MIC-B, which have functions to activate NKT, NK, and T cells, did not also clearly show the patterns. These code-driven mining techniques can be utilized as a metadata-generating tool for systems biology to elucidate the biological function of such conformational fragments of MHC I and II molecules, which come in contact with various signal ligands on the surface of T cells and natural killer cells. INTRODUCTION Major histocompatibility complex (MHC) classes I 1,2 and II 3 molecules are the key proteins for organism self-recognition and have polymorphisms to defend against a great diversity of microbes. For example, natural killer (NK) cells can recognize and kill tumor cells lacking selfmarkers, such as MHC class I, but the basis for this recognition is not completely understood. 2 Several common autoimmune diseases such as rheumatoid arthritis are deeply related to MHC class II and other immune modulators. 3 The polymorphisms of amino acid sequences and molecular structures for MHC molecules and immunogloblins are confusing and make the analysis of structural homology and change using the amino acid sequences very dicult. Further, no eective method to compare with supersecondary structure homology of many proteins currently exists. Therefore, we have developed data mining techniques based on backbone conformations to analyze the supersecondary structure homology of proteins with dierent amino acid sequences. Previously, we have proposed a conformational code for the description of conformations of all kinds of chemical compounds based on structural analysis using vibrational circular dichroism (VCD) of chiral bioactive compounds. 4-7 The conformational code consists of the combination of the codes of regional angle locations and conformational elements (Figure 1), and the conformational elements representing the classication of dihedral angles are substituted for the symbols indicating the bond locations (alphabets of angle locations). 6 For example, the conformational elements 1, 2, 3, 4, 5, and 6 correspond to the conformational terms, T (trans), G + (+gauche), G - (-gauche), sp (synperiplanar), +ac (+anticlinal), Received: September 3, 2012 Published: February 10, 2013 Article pubs.acs.org/jcim © 2013 American Chemical Society 584 dx.doi.org/10.1021/ci300420d | J. Chem. Inf. Model. 2013, 53, 584-591