JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 12, Number 9, 2005
© Mary Ann Liebert, Inc.
Pp. 1221–1241
Automatic 3D Protein Structure Classification
without Structural Alignment
ZEYAR AUNG and KIAN-LEE TAN
ABSTRACT
In this paper, we present a new scheme named ProtClass for automatic classification of
three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classi-
fication scheme. Neither detailed structural alignment nor multiple binary classifications
are required in this scheme. We adopt a nearest neighbor-based classification strategy. We
use a filter-and-refine scheme. In the first step, we filter out the improbable answers using
the precalculated parameters from the training data. In the second, we perform a rela-
tively more detailed nearest neighbor search on the remaining answers. We use very concise
and effective encoding schemes of the 3D protein structures in both steps. We compare
our proposed method against two other dedicated protein structure classification schemes,
namely SGM and CPMine. The experimental results show that ProtClass is slightly bet-
ter in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is
much more accurate, while their running times are about the same. We also compare
ProtClass against a structural alignment-based classification scheme named DALI, which
is found to be more accurate, but extremely slow. The software is available upon request
from the authors. The supplementary information on ProtClass method can be found at:
http://xena1.ddns.comp.nus.edu.sg/∼genesis/PClass.htm.
Key words: protein structure, abstract representation, filter-and-refine, nearest neighbor classi-
fication.
1. INTRODUCTION
A
nalyzing three dimensional (3D) protein structures is an important task in bioinformatics.
Analysis of protein structures can give insights into the functions of proteins, which are useful in many
end-user applications such as drug discovery. Protein structure analysis involves such tasks as protein struc-
ture comparison, classification, homology modeling, and prediction (Orengo et al., 2003), and researchers
have been developing numerous automated methods for structural comparison, homology modeling, and
prediction in the last decade. But only a few automated structural classification methods have been pro-
posed up until now. Instead, people have relied on the manual and semi-automatic classification methods
such as SCOP (Hubbard et al., 1997) and CATH (Orengo et al., 1997). Or they use the traditional structural
Department of Computer Science, National University of Singapore, Singapore 117543.
1221