DOI: 10.1002/minf.201400030 External Evaluation of QSAR Models, in Addition to Cross- Validation: Verification of Predictive Capability on Totally New Chemicals Paola Gramatica* [a] Dear Editors, an interesting paper of Gütlein et al., recently published in your journal, [1] has reopened the debate on the crucial topic of QSAR model validation, which, over the past decade, has been the subject of wide discussions in scien- tific and regulatory communities. Many notable scientific papers have been published (I cite here only a few of the most pertinent [2–17] ) with different underlying ideas on the “best” way to validate QSAR models using various meth- odological approaches: a) only by cross-validation (CV), [1,6–9] simple or double CV, b) by an additional external valida- tion, [2–5, 10–17] (better if verified, in my opinion, by different statistical parameters), [15–18] after the necessary preliminary internal validation by CV. The common final aim is to pro- pose good QSAR models that are not only statistically robust, but also with a verified high predictive capability. The discrepancy in these two approaches lies in this point: how to verify the predictive performance of a QSAR model when applied to completely new chemicals. In the Introduction to their paper [1] Gütlein et al. wrote: “Many (Q)SAR researchers consider validation with a single external test set as the “gold standard” to assess model per- formance and they question the reliability of cross-validation procedures”. In my opinion, this point is not commented on clearly, at least in reference to my cited work, [10] so I wish to clarify my validation approach in order to highlight and re- solve some misunderstandings. First of all, I am sure that all good QSAR modellers cannot disagree that CV (not simply by LOO, but also by LMO and/or bootstrap) is a necessary preliminary step in any QSAR validation, and it is unques- tionably the best way to validate each model for its statisti- cal performance in terms of the robustness and predictivity of partial sub-models on chemicals that have been itera- tively put aside (hold-out) in the test sets. According to some authors, [2–14] including me, [10] this should be defined as the internal validation, because at the end of the com- plete modelling process the molecular structure of all the chemicals has been seen within the validation procedure, and their structural information has contributed to the mo- lecular descriptor selection, at least in one run of CV when they were iteratively put in the training sub-set. Therefore, they are not really external (completely new) to the final model. Indeed, internal validation parameters for proposed QSAR models must always be reported in publications to guarantee model robustness. Moreover, in QSAR modelling, it is important to distin- guish an approach proposing predicted data from a specific single model (easily reproduced by any user) from an ap- proach that produces predicted data obtained by averag- ing the results from multiple models, and therefore by a more complex algorithm. In my research I always apply the first approach, while the work discussed by Gütlein et al. in their paper uses the second one. The reason to prefer a single model, which is a unique specific regression equation based on a few selected descriptors with their rel- ative coefficients, is mainly related to the preference that the “unambiguous algorithm”, (requested by the second Principles of the famous “OECD Principles for validation of QSAR models and applicability in regulation” [19] ) would be the simplest and most easily reproducible, and therefore easily applicable by a wide number of users, including reg- ulators in the new European legislation on chemicals REACH. According to Principle 4, discussed in depth in my previ- ous paper [10] and in the Guidance Documents of the OECD Principles, [20] the model must be verified for its goodness of fit (by R 2 ), robustness (by internal Cross-Validation: Q 2 LOO and Q 2 LMO ) and external predictivity (on external set com- pounds, which did not take part in the model development). Also in the Guidance document there is a clear distinction between internal and external validation in this sense. Only models with good internal validation parameters that guarantee their robustness should be chosen from among all the single models obtained by using the Genetic Algorithm (GA) as method for descriptor selection in Ordi- nary Least Square (OLS) regression (my QSAR approach, as implemented in my in-house software QSARINS). [18] Howev- er, my personal experience (and not only mine) [5,10] is that some QSAR models show good performance when verified [a] by P. Gramatica QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria Via Dunant 3, 21100, Varese, Italy *e-mail: paola.gramatica@uninsubria.it  2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 311 – 314 311 Letter to the Editors www.molinf.com