R.I. McKay and J. Slaney (Eds.): AI 2002, LNAI 2557, pp. 477–486, 2002. © Springer-Verlag Berlin Heidelberg 2002 Protein Sequences Classification Using Modular RBF Neural Networks Dianhui Wang 1 , N.K. Lee 1 , T.S. Dillon 1 and N.J. Hoogenraad 2 1 Department of Computer Science and Computer Engineering La Trobe University, Melbourne, VIC 3083, Australia Ph: +61-3-9479 3034 Fax: +61-3-9479 3060 dhwang@cs.latrobe.edu.au 2 Department of Biochemistry La Trobe University, Melbourne, VIC 3083, Australia Abstract. A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally and structurally re- lated. One of the benefits from this category grouping is that some hint of func- tion may be deduced for individual members from information on other members of the family. Traditionally, two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. These algorithms compare an unseen protein se- quence with all the identified protein sequences and returned the higher scored protein sequences. As the sizes of the protein sequence databases are very large, it is a very time consuming job to perform exhaustive comparison of existing protein sequence. Therefore, there is a need to build an improved classification system for effectively identifying protein sequences. This paper presents a modular neural classifier for protein sequences with improved classification cri- teria. The intelligent classification techniques described in this paper aims to en- hance the performance of single neural classifiers based on a centralized infor- mation structure in terms of recognition rate, generalization and reliability. The architecture of the proposed model is a modular RBF neural network with a compensational combination at the transition output layer. The connection weights between the final output layer and the transition output layer are opti- mized by delta rule, which serve as an integrator of the local neural classifiers. To enhance the classification reliability, we present two heuristic rules to apply to decision-making. Two sets of protein sequences with ten classes of super- families downloaded from a public domain database, Protein Information Re- sources (PIR), are used in our simulation study. Experimental results with per- formance comparisons are carried out between single neural classifiers and the proposed modular neural classifier. 1 Introduction The aim of classification is to predict target classes for given input patterns. There are many approaches available for classification tasks, such as statistical techniques, deci- sion trees [9] and the neural networks [1]. Neural networks have been chosen as