eResearch Solutions for High Throughput Structural Biology Noel Faux 1,2 , Anthony Beitz 3 , Mark Bate 1 , Abdullah A. Amin 1,2 , Ian Atkinson 4 , Colin Enticott 3 , Khalid Mahmood 1,2 , Matthew Swift 3 , Andrew Treloar 3 , David Abramson 3 , James C. Whisstock 1,2 , Ashley M. Buckle 1 1 The Department of Biochemistry and Molecular Biology 2 The ARC Centre of Excellence in Structural and Functional Microbial Genomics Faculty of Medicine, 3 CSIT, Monash University, Clayton, Victoria 3800, Australia 4 High Performance Computing & School of Information Technology James Cook University, Townsville, QLD, 4814, Australia Email: Ashley.Buckle@med.monash.edu.au Abstract Structural biology research places significant demands upon computing and informatics infrastructure. Protein production, crystallization and X-ray data collection require solutions to data management, annotation, target tracking and remote experiment monitoring. Structure elucidation is computationally demanding and requires user-friendly interfaces to high-performance computing resources. Here we discuss how these challenges are being met at the Protein Crystallography Unit at Monash University. Specifically, we have developed informatics solutions for each stage in the structural biology pipeline, from DNA cloning through to protein structure determination. This infrastructure will be pivotal for accelerating the process of structural discovery and will be of significant interest to other laboratories worldwide. 1. Introduction Proteins perform the functions necessary for life in all organisms. Protein function is to a large extent dictated by the 3-dimensional structure, and thus knowledge of the atomic structure of a protein is a prerequisite to understanding its function. The understanding of protein structure now has a firm role in the molecular basis of all diseases, and as such is a vital underpinning for the future promise of de novo drug design. X-ray crystallography is the most common technique for the structure elucidation of proteins. Briefly, this method involves first the production of large amounts of (usually recombinant) pure protein, followed by crystallization and X-ray diffraction analysis. The atomic structure is then calculated from the diffraction pattern using one of several methods. Each stage of this process is technically challenging and a potential bottleneck. Over the last 5 years adoption of automation technologies has eased the bottlenecks at the cloning, protein production and crystallization stages. Availability of synchrotron radiation has increased the rate at which high-quality diffraction data can be collected. Although the development of computational methods of structure elucidation has also undergone significant improvement, the high-throughput nature of the pipeline places an increasing emphasis on informatics and data management requirements for all stages in the process. 2. Protein Crystallography at Monash Protein crystallography at Monash has grown considerably over the past 5 years (Figure 1). To date the unit includes six independent groups and over 100 researchers. In order to cope with the demand for crystallography, the unit has, where possible, deployed robotics to enhance throughput. There are several factors that will increase further this growth over the next 5 years: (1) Recent establishment of ProteinExpress in 2005, a High-Throughput protein