A Visual Analytics Approach to Exploring Protein Flexibility Subspaces Scott Barlowe UNC-Charlotte Jing Yang * UNC-Charlotte Donald J. Jacobs UNC-Charlotte Dennis R. Livesay UNC-Charlotte Jamal Alsakran Kent State University Ye Zhao Kent State University Deeptak Verma UNC-Charlotte James Mottonen UNC-Charlotte ABSTRACT Understanding what causes proteins to change shape and how the resulting shape influences function will expedite the design of more narrowly focused drugs and therapies. Shape alterations are often the result of flexibility changes in a set of localized neighborhoods that may or may not act in concert. Computational models have been developed to predict flexibility changes under varying empir- ical parameters. In this paper, we tackle a significant challenge facing scientists when analyzing outputs of a computational model, namely how to identify, examine, compare, and group interesting neighborhoods of proteins under different parameter sets. This is a difficult task since comparisons over protein subunits that com- prise diverse neighborhoods are often too complex to characterize with a simple metric and too numerous to analyze manually. Here, we present a series of novel visual analytics approaches toward ad- dressing this task. User scenarios illustrate the utility of these ap- proaches and feedback from domain experts confirms their effec- tiveness. Index Terms: I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques; J.3 [Computer Applications]: Life and Medical Sciences—Biology and genetics 1 I NTRODUCTION Deciphering how a protein’s function changes as its three- dimensional shape is altered provides insight for making improved drugs and protein related therapies. A major factor influencing al- terations in shape is the ability of protein subunits, or residues, to change shape. Scientists can characterize this ability, referred to here as protein flexibility, with a variety of computational models that are based on the mechanical properties of residues under dif- ferent parameterized conditions. The challenges encountered when trying to explore the many model outputs that represent the me- chanical variations influencing flexibility can be slow and may de- ter understanding. Difficulties include exploring the possible states for each parameterized condition and then linking their influence on local collections of residues, or protein subspaces, to global be- havior. To illustrate the challenge of protein subspace exploration, we present the following scenario of exploring allosteric response, namely how local protein regions behave in concert or in isolation to affect change across a protein. The study of allostery allows scientists to uncover the intricate and hidden relationships among protein subunits that are critical in altering a protein so that it ex- hibits desired behavior. In this scenario a computational model is used to simulate how residue flexibility changes as each residue is perturbed and the changes are visually presented in plots (see Fig- ure 1 for an example). Plots like these are common in other parts * Corresponding author: jyang13@uncc.edu of bioinformatics, such as in molecular dynamics simulations [19], and present similar challenges: Identifying subspaces of interest. A scientist begins with studying how parameters influence changes in residue flexibility. A good starting point is to simultaneously isolate and explore residue flexibility in a protein subspace and then compare the effect of dif- ferent parameters. In the model outputs, such a subspace occupies the same location in different plots, each of which represents the flexibility for a single parameter combination. Subspaces may vary from a large sequence of residues to individual residues. Through exploration, she wishes to identify a few subspaces that represent interesting behavior, such as those experiencing the greatest or least change in flexibility when parameters change. In particular, she is interested in the two subspaces that bound the opposite ends of the flexibility range. Examining subspaces within neighborhoods. After identify- ing several subspaces of interest, the analyst includes neighboring residues in the analysis. Examining subspaces within their neigh- borhoods allows her to assess if a parameter’s effect on a single region extends to neighboring residues and to more precisely de- fine the range of the region with interesting behavior. Additionally, it allows her to learn how the residues of interest are influenced by and act in concert with other residues. Grouping subspaces. The analyst is now ready to examine residue similarity as parameter values are varied. Grouping similar residues and noting any changes in their similarity is useful for pre- cisely locating where parameters have different effects on individ- ual residues. For example, suppose that there are several residues thought to be similar and group together for most parameter com- binations. If a parameter combination is input to the model and the similarity changes noticeably for one or more residues, the analyst can isolate the residues with decreased similarity. Then, the isolated residues can be studied more closely under the latter parameter set to understand the differences in flexibility. Additionally, this in- formation can be used to test the correctness of the model against domain knowledge. Similar scenarios can be found in numerous other applications, such as in the study of the effect of mutations or substrate binding, as well as using covariance plots from molecular dynamics simu- lations. These tasks are difficult with current tools. A single pro- tein can often have hundreds of residues and the flexibility of each residue may vary with each possible parameter combination. The size of subspaces and the influence of neighboring regions can also vary. Computational methods applied during analysis often result in summary metrics that hide underlying relationships. Parameter optimization tools are ineffective because neither the optimal con- ditions nor the parameter values that result in those conditions are known beforehand. Visualization tools are limited by the enforce- ment of a strict ordering along domain plot axes and the lack of functions for including neighborhoods into detailed analysis. In this paper we present a novel approach to help scientists more efficiently and effectively investigate protein subspaces. Our ap- proach is built upon an existing system that aims to ease discov- ery in protein flexibility outputs, a framework called WaveMap