Molecular Diversity (2006) 10: 213–221 DOI: 10.1007/s11030-005-9008-y c Springer 2006 Full-length paper A novel RBF neural network training methodology to predict toxicity to Vibrio fischeri Georgia Melagraki 1 , Antreas Afantitis 1 , Haralambos Sarimveis 2,∗ , Olga Igglessi-Markopoulou 1 & Alex Alexandridis 2 1 Laboratory of Organic Chemistry, School of Chemical Engineering, National Technical University of Athens, 9, Heroon Polytechniou Str., Zografou Campus, Athens 15780, Greece; 2 Laboratory of Process Control & Informatics, School of Chemical Engineering, National Technical University of Athens, 9, Heroon Polytechniou Str., Zografou Campus, Athens 15780, Greece ( ∗ Author for correspondence, E-mail: hsarimv@central.ntua.gr, Tel.: +30-210-7723237, Fax: +30-210-7723138) Received 8 November 2005; Accepted 14 December 2005 Keywords: neural network, QSTR, RBF architecture, toxicity, Vibrio fischeri Summary This work introduces a neural network methodology for developing QSTR predictors of toxicity to Vibrio fischeri. The method adopts the Radial Basis Function (RBF) architecture and the fuzzy means training strategy, which is fast and repetitive, in contrast to most traditional training techniques. The data set that was utilized consisted of 39 organic compounds and their corresponding toxicity values to Vibrio fischeri, while lipophilicity, equalized electronegativity and one topological index were used to provide input information to the models. The performance and predictive ability of the RBF model were illustrated through external validation and various statistical tests. The proposed methodology can be used to successfully model toxicity to Vibrio fischeri for a heterogeneous set of compounds. 1. Introduction Toxicology deals with the quantitative assessment of the toxic effects to organisms in relation to the level, duration and fre- quency of exposure. In general, exposure to toxic substances is to be avoided and thus toxicity assessment of such com- pounds is vital [1]. Among the bacterial assays, the Vibrio fischeri luminescence inhibition assay is the most popular. Bioluminescent bacteria toxicity tests offer a convenient, sen- sitive and efficient ethical alternative to testing on higher species [2, 3]. As the experimental determination of toxicological prop- erties is a costly and time consuming process, it is essential to develop mathematical predictive relationships to theoret- ically quantify toxicity [4, 5]. Quantitative Structure – Tox- icity Relationship (QSTR) studies can provide a useful tool for achieving this goal, that is predicting the toxic potency of untested compounds [6, 7].Apart from serving as predic- tors of ecological and human health effects, QSTRs are also utilized in the process of designing safer chemicals for com- mercial use. The use of toxicity data from Vibrio fischeri tests in the development of QSTRs is adopted in several publica- tions [8–11]. For the formal description of relationships between activ- ity measures and structural descriptors of compounds var- ious statistical techniques can be used. Among them, the most popular are Multiple Linear Regression (MLR) [12–14] and Partial Least Squares (PLS) [7]. Several other statistical techniques have been used for the same purpose, including discriminant analysis, principal component analysis (PCA) and factor analysis, cluster analysis, multivariate analysis, and adaptive least squares [5, 15]. Neural Network (NN) techniques have also been applied successfully in devel- oping quantitative structure-activity relationships [16–20]. NNs have gained attention due to their ability to describe non-linear relationships with success. The objective of this work was to investigate the poten- tial of using a special neural network architecture, namely the Radial Basis Function (RBF) networks in the develop- ment of a QSTR model for predicting toxicity of compounds to Vibrio fischeri. More specifically, a recently introduced training methodology for generating Radial Basis Function (RBF) neural networks was utilized. The method uses the in- novative fuzzy means clustering technique to determine the number and the locations of the hidden node centers [21]. The most significant advantages of this method compared to