An Interdisciplinary Approach to Understanding and Supporting the Analytical Processes of High-Throughput Biological Data Analysis Elijah Myers*, Chris North, Ruth Grene, Lenwood S. Heath, Eva Collakova, and Lecong Zhou Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 ABSTRACT The abundance of high-throughput biological data (e.g., genomics, proteomics, metabolomics) in existing databases and literature has enabled systems level viewpoints of biological data analysis. As visualization tools to support the analytical processes of biologists continue to emerge, questions remain as to the efficacy of existing visualization tools and the extent to which the analytical processes employed by biologists are understood by tool developers. Traditional interviewing and user observation techniques to address these questions during tool evaluation and development are inherently subject to certain shortcomings, namely, the inclusion of feedback from expert biologists at only some stages of the development/review process and the lack of an in vivo approach during user observation. To address these issues, we present an interdisciplinary approach to tool review and development with preliminary results regarding the identification of shortcomings in existing visualization tools, an understanding of the analytical processes employed by biologists, and observations on the utility that emerging technologies (i.e., large, high-resolution displays) provide in biological data analysis. We include a discussion of the results to address the benefits provided by the chosen approach, to provide a framework for the development of tools to support the analytical processes of biologists, and to explore the potential of large, high-resolution displays as a mitigating factor in supporting the analysis of high- throughput biological data. KEYWORDS: Information visualization, visual analytics, high- throughput data analysis, interdisciplinary collaboration, large high-resolution displays. INDEX TERMS: J.3 [Computer Applications]: Life and Medical SciencesBiology and Genetics; H.5.2 [Information Interfaces and Presentation]: User InterfacesEvaluation/Methodology 1 INTRODUCTION The growing prevalence of high-throughput technologies in biology (genomics, transcriptomics, proteomics, and metabolomics) has led to an abundance of biological data in existing databases and the research literature [1, 13], as well as systems level viewpoints in the analysis of high-throughput biological data. As data have accumulated, the need for software tools to aid in the analytical processes of biologists has increased, with the success of developed tools being contingent upon an understanding of those analytical processes that they are meant to support. As researchers in the fields of information visualization and visual analytics continue to seek this understanding, key questions to address include 1) how effective are existing tools for the visualization of high-throughput biological data in supporting high-throughput data analysis, and 2) the extent to which the analytical processes employed by expert biologists during analysis are understood. Approaches to address these questions have typically been straightforward, often involving professional biologists (or biology students) as the subjects of interviews to provide an analytical review of existing visualization tools and techniques [16] or as the subjects of user observation studies [15] to empirically evaluate the performance of visualization tools or techniques. Problems inherent in common interviewing techniques are that the feedback elicited from biologists is limited to only certain stages of research or tool development (e.g., requirements analysis interviews, post user study interviews), and that biologists are relied upon to describe their own analytical processes a task that may lead to a skewed understanding of the actual processes employed [7]. Problems inherent in common user study approaches are that biologists are often asked to interact with tools using fabricated data sets (or data sets unrelated to current research interests) and are often observed in laboratory settings, both of which diminish the benefits of an approach that preserves the natural analytical practices of participants. The current research addresses these problems by providing an interdisciplinary approach to tool review and development, with key focuses being the continuous interaction among biologists and computer scientists at all stages of research, and the prolonged observation of biologists in a context most related to their current research practices and interests. Maintaining these focuses, we seek to identify the shortcomings present in existing visualization tools to support the analysis of biological data, to observe the analytical processes employed by biologists during the use of these tools, and to use this knowledge to develop a prototype that better supports the analysis of high-throughput biological data. We provide preliminary results of the proposed research, including a presentation of the methods used to facilitate interdisciplinary collaboration (Section 2), the results of the tool review process (Section 3), and a discussion of the benefits of the implemented interdisciplinary approach, a framework for the development of tools to support the analytical processes of biologists, with the consideration of large, high-resolution displays as a mitigating factor in supporting the analysis of high- throughput biological data (Section 4). We conclude with a brief review and an overview of plans for future prototype development. 2 METHODS The methods implemented seek to alleviate problems inherent in commonly used interviewing and user study techniques by providing a framework for continuous interdisciplinary collaboration that facilitates an understanding of the shortcomings that exist in current tools for the visualization of biological data, *Corresponding author. Email: esm2310@vt.edu