DOI: 10.1002/minf.201300172 Global Model for Octanol-Water Partition Coefficients from Proton Nuclear Magnetic Resonance Spectra Nan An, [a] Farid Van Der Mei, [a] and Adelina Voutchkova-Kostal* [a] 1 Introduction The octanol-water partition coefficient (logP) of a chemical is the equilibrium ratio of concentrations between the octa- nol and water phases, which reflects its molecular hydro- phobicity. logP is a widely used physicochemical property in medicinal chemistry and toxicology. [1] Medicinal chemists routinely use logP to estimate oral [2] and skin [3] bioavailabil- ity of drug candidates, as well as build QSAR models, while ecotoxicologists and regulators use it to model acute and chronic toxicity to aquatic species [4,5] and potential for bio- accumulation. [6] Rules of thumb for designing minimally toxic chemicals to aquatic species are also based on logP : for example, compounds with logP less than 2 are ca. 80 % more likely to have low acute and chronic toxicity to aquat- ic species. [7] logP is thus a ubiquitous property that is rou- tinely determined by chemists, toxicologists and regulators, and streamlined, accurate methods for its determination are highly desirable. A number of experimental techniques are available for determining logP : from the traditional shake-flask method, [8] which requires extensive centrifugation, to the more modern methods involving HPLC, [9] microemulsion electrokinetic chromatography, [10] and centrifugal partition chromatography. [11] The modern methods are more conven- ient than the shake-flask method, but are limited to com- pounds with certain ranges of logP or pKa values, and can be less reliable than the shake-flask method. [12] Some classes of compounds, such as surfactants, pose a particular challenge for certain methods. For example, the HPLC method for measurement of logP is not applicable to sur- factants because their retention times on the chromatogra- phy column are also dependent on the surfactant’s prefer- ence for surfaces and interfaces. [13] Furthermore all of these methods require prior purification of the chemical. To provide rapid and convenient methods for logP deter- mination several types of in-silico estimation methods have been developed. [14] The group contribution methods, for example, use the relative contributions of molecular frag- ments or atoms to predict logP . The predictive power of the most commonly used group contribution tools, such as ALOGP [15] , CLOGP [16] , ACD, [17] KOWWIN [18] is in the range of 0.90–0.95 r 2 based on training sets of up to 13 000 com- pounds. [19] Although very fast and accurate, these methods often show lower accuracy when externally validated (r 2 : 0.51–0.91), [23] which could be due to limitations in the ap- plicability domains to structures containing predefined fragments. In addition, group contribution methods do not take into account whole-molecule attributes, such as sur- face area, dipole moment and connectivity. Methods based on multiple linear regressions of molecular topology de- scriptors overcome some of these challenges. For example, methods implemented in VLOGP [19b] employ electrotopo- [a] N. An, F. Van Der Mei, A. Voutchkova-Kostal Chemistry Department, The George Washington University Washington, DC 20052, USA *e-mail: avoutchkova@gwu.edu Supporting information for this article is available on the WWW under http://dx.doi.org/10.1002/minf.201300172. Abstract : The ability to estimate chemical and physical properties from experimental spectra is highly desirable, as it eliminates the need for a priori knowledge of exact chemical structure and allows the property estimation of mixtures. Here we report the proof of principle that a pre- dictive method for octanol-water partition coefficient (logP) based on 1 H-NMR spectra in d 3 -chloroform is feasible and can yield accuracy comparable to in silico logP models. The Spectrometric Data-Activity Relationship (QSDAR) reported predicts logP of neutral organic chemicals using descriptors derived only from 1 H-NMR chemical shifts, integrations and peak widths. Proton NMR spectra of 140 compounds with diverse structures were used to construct a Multiple Linear Regression (MLR) and a Partial Least Squares (PLS) model that predicts logP. The optimized models were internally validated by K-fold cross validation and leave-one-out vali- dation, and externally with a test set of 28 chemicals. The squared regression coefficients of prediction for the MLR and PLS regression models were 0.970 and 0.971 respec- tively, showing that the method allows accurate prediction of logP values exclusively from predicted 1 H NMR spectra. Keywords: Octanol-water partition · NMR · QSDAR · PLS · MLR 2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 286 – 292 286 Full Paper www.molinf.com