Rapid Communication CORAL: Quantitative Structure–Activity Relationship Models for Estimating Toxicity of Organic Compounds in Rats A. P. TOROPOVA, 1 A. A. TOROPOV, 1 E. BENFENATI, 1 G. GINI, 2 D. LESZCZYNSKA, 3 J. LESZCZYNSKI 4 1 Istituto di Ricerche Farmacologiche Mario Negri, Laboratory of Environmental Chemistry and Toxicology, 20156, Via La Masa 19, Milano, Italy 2 Department of Electronics and Information, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy 3 Interdisciplinary Nanotoxicity Center, Department of Civil and Environmental Engineering, Jackson State University, 1325 Lynch Street, Jackson, Mississippi 39217-0510 4 Interdisciplinary Nanotoxicity Center, Department of Chemistry and Biochemistry, Jackson State University, 1400 J. R. Lynch Street, P.O. Box 17910, Jackson, Mississippi 39217 Received 25 March 2011; Revised 6 May 2011; Accepted 9 May 2011 DOI 10.1002/jcc.21848 Published online 8 June 2011 in Wiley Online Library (wileyonlinelibrary.com). Abstract: For six random splits, one-variable models of rat toxicity (minus decimal logarithm of the 50% lethal dose [pLD50], oral exposure) have been calculated with CORAL software (http://www.insilico.eu/coral/). The total number of considered compounds is 689. New additional global attributes of the simplified molecular input line entry system (SMILES) have been examined for improvement of the optimal SMILES-based descriptors. These global SMILES attributes are representing the presence of some chemical elements and different kinds of chemical bonds (double, triple, and stereochemical). The ‘‘classic’’ scheme of building up quantitative structure–property/ac- tivity relationships and the balance of correlations (BC) with the ideal slopes were compared. For all six random splits, best prediction takes place if the aforementioned BC along with the global SMILES attributes are included in the modeling process. The average statistical characteristics for the external test set are the following: n 5 119 6 6.4, R 2 5 0.7371 6 0.013, and root mean square error 5 0.360 6 0.037. q 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2727–2733, 2011 Key words: balance of correlations; oral rat toxicity; QSAR; SMILES Introduction Quantitative structure–property/activity relationships (QSPR/ QSAR) are tools for prediction of an endpoint for substances that have not been examined experimentally. 1–11 It is a typical concept that satisfactory QSPR/QSAR model should be adequate for both the training and the external test sets. However, there are cases when a modest (or even poor) model for the training set is accompanied by a satisfactory model for the external test set. It is important that, however, this result should be reproduced in several probes involved in the building up of the model, and the result should be reproduced for a group of splits into the training and test sets. 12,13 The QSPR/QSAR approaches are criticized in the litera- ture 14–16 due to the frequent absence of the external test 17 and also because depending on the split of data into training and test sets the outcome could be satisfactory or unacceptable. Basak et al. 18 have suggested the definition "inflated" in case of QSAR models which are unreliable for making predictions for chemi- cals similar to those used to calibrate the model. Indeed, the suc- cess of the result can be dependent on the split of the com- pounds between training, calibration and test sets. The possible way to avoid the inflated correlations 18 is first, the consideration of a group of splits, and second, the reliability of the QSPR/QSAR models should be considered as more important quality than their precision. 12,13 Additional Supporting Information may be found in the online version of this article. Correspondence to: A. A. Toropov; e-mail: andrey.toropov@marionegri.it q 2011 Wiley Periodicals, Inc.