TetrahedronComputer Methodology, Vol. 3, No. 1, pp. 47 to 59, 1990 0898-5529/90 $3.00+.00 Printed in Great Britain Pergamon Press plc Comparative Molecular Field Analysis (CoMFA). 2. Toward its use with 3D-Structural Databases Matthew Clark, Richard D. Cramer III*, Dumont M. Jones, David E. Patterson, and Perry E. Simeroth Tripos Associates, 1699 South Hanley Road, St. Louis, MO 63144, USA Received 28 February ]990, Accepted 15 April 1990 Key words: CoMFA; Field-fit; carbonyl addition; chance correlation; 3D databases; PLS Abstract: The primary importance of molecular fields in biological recognition, attested by the number of reported successful CoMFA applications, suggests possible applications in 3D databases. In further support of this possibility, the probability of chance correlation using PLS in typical CoMFA applications is found to be about 5% or less for a cross- validated r2 of 0.3 or greater, a 'field fit" strategy for automating the alignment of molecules by seeking minimal differences in their fields is outlined, and a non-biological application of CoMFA, carbonyl hydration, is presented. INTRODUCTION Any computerized system for the systematic storage and retrieval of 3D-chemical structures is likely to be most useful if it is based on the features which are important to the recognizers of 3D- chemical structure. These features may not only be distances among atomic nuclei, or properties at molecular surfaces, but also may be differences in the fields that a molecule exerts at some distance from its surface. Comparative Molecular Field Analysis (CoMFA) is a promising new approach to structure/biological activity correlation.l, 2 Its rationale is two-fold: (1) at the molecular level, the interactions which produce an observed biological effect are usually non-covalent; (2) molecular mechanics force fields, most of which treat non-covalent (non-bonded) interactions only as steric and electrostatic forces, can account precisely for a great variety of observed molecular properties. Thus it seems reasonable that a suitable sampling of the steric and electrostatic fields surrounding a set of ligand (drug) molecules might provide all the information necessary for understanding their observed biological properties. As currently implemented, the characteristic features of CoMFA are: (1) representation of ligand molecules by their steric and electrostatic fields, sampled at the intersections of a three-dimensional lattice; (2) a new "field fit" technique, allowing optimal mutual alignment within a series, by minimizing the RMS field differences between molecules, discussed below in more detail; (3) data analysis by partial least squares (PLS), 3 using cross-validation to maximize the likelihood that the results have predictive validity; (4) graphic presentation of results, as contoured three-dimensional coefficient plots. 47