Unsupervised classification of single particles by cluster tracking in multi-dimensional space Jie Fu a , Haixiao Gao b , Joachim Frank a,b, * a Department of Biomedical Sciences, State University of New York at Albany, Empire State Plaza, Albany, NY 12201-0509, USA b Howard Hughes Medical Institute, Health Research, Inc. at the Wadsworth Center, Empire State Plaza, Albany, NY 12201-0509, USA Received 11 April 2006; received in revised form 7 June 2006; accepted 11 June 2006 Available online 21 July 2006 Abstract In cryo-electron microscopy (cryo-EM) single-particle reconstruction, the heterogeneity of two-dimensional projection image data resulting from the co-existence of different conformational or ligand binding states of a macromolecular complex remains a major obsta- cle as it impairs the validity of reconstructed density maps and limits the progress toward higher resolution. Classification of cryo-EM data according to the different conformations is difficult because of the coexistence of multiple orientations in a single dataset. Here, we present an unsupervised classification method, termed cluster tracking, which utilizes the continuity in multi-dimensional space induced by angular adjacency of projections in large datasets. In a proof of concept, the testing of cluster tracking on simulated projection data, which were generated from multiple conformations and orientations of an existing volume, produced clusters that are consistent with the conformational identity of the data. The application of the method to experimental cryo-EM projection data is found to result in a par- tition similar to the one generated by supervised classification. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Cluster tracking; Classification; Conformational heterogeneity 1. Introduction Cryo-electron microscopy (cryo-EM), in combination with single-particle reconstruction (see Frank, 2006), is used increasingly for the visualization of biological mac- romolecular complexes, especially in situations where the complex is too large and too flexible to be amenable to crystallization for X-ray crystallography. Examples of structures whose study has greatly benefited from the development of this methodology include the ribosome (Frank, 2001) and GroEL (Ludtke et al., 2004; Saibil, 2000). The structure of the ribosome and its interaction with its ligands have been extensively studied by cryo- EM, which resulted in the discovery of the complex architecture of the ribosome, determination of binding positions of the factors and tRNA, and observations of conformational changes in response to the binding of protein factors (reviewed by Frank, 2001; Frank and Spahn, 2006). One of the important assumptions in single-particle reconstruction is that all the data represent randomly ori- ented two-dimensional (2D) projections of the same three-dimensional (3D) structure; that is, the sample must be highly homogeneous. However, a high level of sample homogeneity is often difficult to achieve, especially when the molecule has flexible domains or when it can occur in different ligand binding states. Therefore, the projection images extracted from electron micrographs frequently rep- resent projections of 3D structures that differ in conforma- tion. A reconstruction from such mixed datasets cannot accurately portray any of the co-existing conformational states; for example, regions with structural changes often appear to be fragmented in the reconstructed EM maps. The heterogeneity of the dataset also adversely affects the resolution of reconstructed 3D volumes. www.elsevier.com/locate/yjsbi Journal of Structural Biology 157 (2007) 226–239 Journal of Structural Biology 1047-8477/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2006.06.012 * Corresponding author. Fax: +1 518 486 2191. E-mail address: joachim@wadsworth.org (J. Frank).