Strategies for Data Flow and Storage for High Throughput, High Resolution Cryo- EM Data Collection William J. Rice 1,2* , Anchi Cheng 1,2 , Sargis Dallakyan 1 , Swapnil Bhatkar 1 , Shaker Krit 1 , Edward T. Eng 1,2 , Bridget Carragher 1,2,3 and Clinton S. Potter 1,2,3 1. National Resource for Automated Molecular Microscopy, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY, USA. 2. National Center for Cryo-EM Access and Training, New York Structural Biology Center, New York, NY, USA. 3. Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA. * Corresponding author: rice@nysbc.org The introduction of direct detectors to the field of electron microscopy has revolutionized the field of structural biology [1]. Structures of proteins resolved to 3-4 Å resolution are now commonly determined, resolution ranges between 2 Å and 3 Å are becoming more common, and resolutions beyond 2 Å are now possible in a few cases (Fig. 1). All of these single particle techniques require a large number of images to be taken. The Simons Electron Microscopy Center (SEMC) currently has three Titan Krios microscopes on site and operating 24 hours per day, 7 days per week, 52 weeks per year. The use of Leginon software on all microscopes allows for unattended operation once collection queues are set up. Each microscope is currently equipped with both a Gatan K2 camera (Gatan Inc, Pleasanton CA) and a Falcon 3 camera (Thermo Fisher Scientific). The K2 cameras typically collect movies of length 6-12 s, with a frame time generally between 150 ms and 250 ms. The raw camera size is 3838x3710 pixels, and our most common movie size is 50 frames, which corresponds to 1400 MB if stored in uncompressed form. In addition, the aligned sum takes 55 MB of disk space and does not compress well under lossless schemes. Since installation of the first Titan Krios, movie collection has grown exponentially, and we are now approaching 5,000 movies per day (Fig. 2). We pre-process our data “on the fly” using the Appion workflow [2]. The advantage of this workflow is that it provides an easy interface for staff scientists to align frames, determine CTF parameters, and start picking particles by entering only a few parameters, with most required parameters for these programs either set by default or pulled from the leginon database. The operator is therefore freed from much tedious decision making and can concentrate on the quality of the data itself as it comes down the pipeline. Our standard data pipeline includes saving the raw frames as LZW compressed tiff stacks, rather than MRC format stacks. The LZW tiff compression is lossless and is performed in memory before saving to disk. This saves time in saving, since less data needs to be written, as well as in transferring to our buffer computer. In addition, image processing software such as Relion [3] can read the tiff-LZW stack natively. These points provide clear advantages over our previous method, which was to save as MRC and then compress with bzip. Bzip compressed stacks are slightly smaller, compressing to 15.8% of the original size versus 20.1% for tiff, but the processing advantages greatly outweigh the small space saving. Movies are saved onto the K2 computer and soon afterwards automatically moved to a buffer computer over a dedicated fiber line. The buffer computer is equipped with 50 TB of disk space and 2 Nvidia 1080 1394 doi:10.1017/S1431927619007700 Microsc. Microanal. 25 (Suppl 2), 2019 © Microscopy Society of America 2019 https://doi.org/10.1017/S1431927619007700 Downloaded from https://www.cambridge.org/core. IP address: 3.92.57.205, on 24 May 2020 at 02:20:42, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.