Non-linear Prediction of Quantitative Structure–Activity Relationships Peter Tiˇ no ∗ , School of Computer Science, Birmingham University, Birmingham B15 2TT, UK Ian T. Nabney † , Neural Computing Research Group, Aston University, Birmingham B4 7ET, UK Bruce S. Williams, Jens L¨osel, Pfizer Global Research and Development, Sandwich, Kent CT13 9NJ, UK Yi Sun Faculty of Eng & Info Sciences, University of Hertfordshire, Hatfield, AL10 9AB, UK Abstract Predicting the log of the partition coefficient P is a long-standing benchmark problem in Quantitative Structure-Activity Relationships (QSAR). In this paper we show that a relatively simple molecular representation (using 14 variables) can be combined with lead- ing edge machine learning algorithms to predict logP on new compounds more accurately than existing benchmark algorithms which use complex molecular representations. 1 Introduction The majority of pharmaceutical agents must cross a biological membrane to reach their site of action and to be available in a cellular environment. Lipophilicity of the ‘drug’ molecule has a major impact upon its distribution and biological action. 1 Hence quantitative measures of lipophilicity are very important in the development of drug molecules. The partition coefficient of a molecule is the ratio of its solubility in n-octanol to its solubility in water; 2 the logarithm of this quantity, LogP, is a well established measure of a compound’s lipophilicity. In principle, the measurement of the equilibrium concentration of solute in the octanol and water phases, after shaking in a separatory funnel, is very simple, and since good measured values are always to be preferred over calculated ones, it would seem that there should be little need for a procedure to calculate them. However, in practice, measurement of LogP for large numbers of compounds is costly and time consuming, and hence computational methods are employed to estimate or predict values where possible. In addition, it is valuable to have an estimate of lipophilicity before synthesising novel compounds, and this can be only be done using a predictive model. * Email: p.tino@cs.bham.ac.uk † Email: i.t.nabney@aston.ac.uk 1