Mu-8: Visualizing Differences between a Protein and its Family John Mercer ˚ Harvard University Broad Institute Balaji Pandian : Harvard University Nicolas Bonneel ; Harvard University Alexander Lex § Harvard University Hanspeter Pfister ¶ Harvard University -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Alpha Helix & Turn Propensity -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Beta Sheet Propensity -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Composition -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Hydrophobicity -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Physico-Chemical Properties -3 -2 -1 0 1 2 3 0% 10% 20% 30% 40% Other Characteristics 0 2 4 6 8 -8 -6 -4 -2 0 K A L L A G L G D E W K N V V 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 0 5 10 15 20 25 30 35 40 45 Distance (Unit: Angstrom) 0% 2% 4% 6% 8% 0 5 10 Auto rotate Stacked Select All Clear 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 Score Histograms Proximity Chords 3D Structure Focus Sequence Context Sequence Sequence Proximity Histogram Conservation Heat Map Figure 1: The annotated Mu-8 interface showing how characteristics of a defective protein compare to its functional family. ABSTRACT A complete understanding of the relationship between the amino acid sequence and resulting protein function remains an open prob- lem in the biophysical sciences. Current approaches often rely on diagnosing functionally relevant mutations by determining whether an amino acid frequently occurs at a specific position within the protein family. These methods, however, fail to appropriately ac- count for the biophysical properties and the 3D structure of the protein. To remedy this, we have developed an interactive visual- ization technique, Mu-8, that provides researchers with a holistic view of the residues that have significantly different characteristics from a family of homologous yet functional proteins. Mu-8 enables analysts to identify regions of the sequence that have biophysical anomalies, while clearly communicating the spatial relationships amongst residues. Index Terms: Protein Function, Genetic Variants, Amino Acid Indices, Biological Visualization. 1 I NTRODUCTION Proteins are biochemical products that perform specific functions in a cell or an organism. A protein is made of a sequence of amino acids (also referred to as residues) that are coded for by genes. Proteins ˚ e-mail: mercer@broadinstitute.org : e-mail: balajipandian@college.harvard.edu ; e-mail: nbonneel@seas.harvard.edu § e-mail: alex@seas.harvard.edu ¶ e-mail: pfister@seas.harvard.edu perform vital roles, including metabolic processes and housekeeping, such as DNA replication. A protein derives its function from its three-dimensional structure (the tertiary structure), which is in turn driven by the biochemical properties of its amino acid sequence (the primary structure). Understanding and being able to predict the 3D structure from the amino acid sequence, however, is part of the unsolved protein-folding problem [4]. While a general solution to this problem is not within reach of cur- rent methods, interactive visualization and computational analysis can help biologists understand the relationship between the amino acid sequence and a protein’s 3D structure. This in turn will enable analysts to predict which mutations in an amino acid sequence cause the loss of function in a protein. Motivated by the problem and the data published for the 2013 IEEE BioVis Data Contest 1 , we have developed Mu-8, a novel, interactive visualization tool for comparing proteins to their family and for identifying potential regions that cause functional breakdown. Different or altered proteins can often fulfill the same function, albeit often with different efficiency. Such proteins are referred to as a protein family and are mostly evolutionary related. This demonstrates that function is often preserved even if the amino acid sequence is changed. On the other hand, small changes to the sequence can sometimes cause function to break down. The challenge posed by the BioVis Data Contest is to find out which mutation(s) in a highly mutated amino acid sequence causes this functional break-down. Using Mu-8, an analyst can: (1) quickly identify residues or regions of residues that are significantly different from the family with respect to one or more characteristics; (2) identify whether such a region is in an otherwise highly conserved 1 http://biovis.net/year/2013/info/contest