R.I. McKay and J. Slaney (Eds.): AI 2002, LNAI 2557, pp. 477–486, 2002.
© Springer-Verlag Berlin Heidelberg 2002
Protein Sequences Classification Using Modular RBF
Neural Networks
Dianhui Wang
1
, N.K. Lee
1
, T.S. Dillon
1
and N.J. Hoogenraad
2
1
Department of Computer Science and Computer Engineering
La Trobe University, Melbourne, VIC 3083, Australia
Ph: +61-3-9479 3034 Fax: +61-3-9479 3060
dhwang@cs.latrobe.edu.au
2
Department of Biochemistry
La Trobe University, Melbourne, VIC 3083, Australia
Abstract. A protein super-family consists of proteins which share amino acid
sequence homology and which may therefore be functionally and structurally re-
lated. One of the benefits from this category grouping is that some hint of func-
tion may be deduced for individual members from information on other members
of the family. Traditionally, two protein sequences are classified into the same
class if they have high homology in terms of feature patterns extracted through
sequence alignment algorithms. These algorithms compare an unseen protein se-
quence with all the identified protein sequences and returned the higher scored
protein sequences. As the sizes of the protein sequence databases are very large,
it is a very time consuming job to perform exhaustive comparison of existing
protein sequence. Therefore, there is a need to build an improved classification
system for effectively identifying protein sequences. This paper presents a
modular neural classifier for protein sequences with improved classification cri-
teria. The intelligent classification techniques described in this paper aims to en-
hance the performance of single neural classifiers based on a centralized infor-
mation structure in terms of recognition rate, generalization and reliability. The
architecture of the proposed model is a modular RBF neural network with a
compensational combination at the transition output layer. The connection
weights between the final output layer and the transition output layer are opti-
mized by delta rule, which serve as an integrator of the local neural classifiers.
To enhance the classification reliability, we present two heuristic rules to apply
to decision-making. Two sets of protein sequences with ten classes of super-
families downloaded from a public domain database, Protein Information Re-
sources (PIR), are used in our simulation study. Experimental results with per-
formance comparisons are carried out between single neural classifiers and the
proposed modular neural classifier.
1 Introduction
The aim of classification is to predict target classes for given input patterns. There are
many approaches available for classification tasks, such as statistical techniques, deci-
sion trees [9] and the neural networks [1]. Neural networks have been chosen as