JOURNAL OF COMPUTATIONAL BIOLOGY Volume 12, Number 9, 2005 © Mary Ann Liebert, Inc. Pp. 1221–1241 Automatic 3D Protein Structure Classification without Structural Alignment ZEYAR AUNG and KIAN-LEE TAN ABSTRACT In this paper, we present a new scheme named ProtClass for automatic classification of three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classi- fication scheme. Neither detailed structural alignment nor multiple binary classifications are required in this scheme. We adopt a nearest neighbor-based classification strategy. We use a filter-and-refine scheme. In the first step, we filter out the improbable answers using the precalculated parameters from the training data. In the second, we perform a rela- tively more detailed nearest neighbor search on the remaining answers. We use very concise and effective encoding schemes of the 3D protein structures in both steps. We compare our proposed method against two other dedicated protein structure classification schemes, namely SGM and CPMine. The experimental results show that ProtClass is slightly bet- ter in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is much more accurate, while their running times are about the same. We also compare ProtClass against a structural alignment-based classification scheme named DALI, which is found to be more accurate, but extremely slow. The software is available upon request from the authors. The supplementary information on ProtClass method can be found at: http://xena1.ddns.comp.nus.edu.sg/genesis/PClass.htm. Key words: protein structure, abstract representation, filter-and-refine, nearest neighbor classi- fication. 1. INTRODUCTION A nalyzing three dimensional (3D) protein structures is an important task in bioinformatics. Analysis of protein structures can give insights into the functions of proteins, which are useful in many end-user applications such as drug discovery. Protein structure analysis involves such tasks as protein struc- ture comparison, classification, homology modeling, and prediction (Orengo et al., 2003), and researchers have been developing numerous automated methods for structural comparison, homology modeling, and prediction in the last decade. But only a few automated structural classification methods have been pro- posed up until now. Instead, people have relied on the manual and semi-automatic classification methods such as SCOP (Hubbard et al., 1997) and CATH (Orengo et al., 1997). Or they use the traditional structural Department of Computer Science, National University of Singapore, Singapore 117543. 1221