COREPA-M: A Multi-Dimensional Formulation of COREPA Ovanes Mekenyan a *, Nina Nikolova b , Patricia Schmieder c and Gilman Veith d a Laboratory of Mathematical Chemistry, University ™Prof. As. Zlatarov∫, 8010 Bourgas, Bulgaria b Central Laboratory of Parallel Processing, Bulgarian Academy of Sciences, ™Acad. G.Bonchev∫ str. 25A, 1756 Sofia, Bulgaria c U.S. Environmental Protection Agency, Mid-Continent Ecology Division, 6201 Congdon Blvd., Duluth, MN 55804, U.S.A. d International QSAR Foundation for Reducing Animal Testing, Duluth, Minnesota, U.S.A. Full Paper Recently, the COmmon REactivity PAttern (COREPA) approach was developed as a probabilistic classification method which was formalized specifically to advance mechanistic QSAR development by addressing the impact of molecular flexibility on stereoelectronic properties of chemicals. In the initial version of COREPA, the proba- bility distributions for only one stereoelectronic parameter at a time were analyzed for the series of chemicals under analysis. To go beyond considering probability distribu- tions of one parameter at a time requires the capability of analyzing a suite of parameters simultaneously for each chemical. This work creates that capability for a multi- dimensional formulation of the COREPA which is ex- pected to enhance the reliability of the method to discriminate complex patterns. Using probability distance measures such as Kullback-Leibler divergence and Hel- linger distance, the set of parameters are defined that best discriminate activity. The COREPA-M system automati- cally identifies the parameters that best discriminates chemicals in groups defined by comparable reactivity endpoints. A detailed Bayesian decision tree is then used for classifying untested chemicals with measures of ™good- ness of fit∫ criteria. COREPA-M is illustrated using the example of modelling binding affinity of chemicals at the aryl hydrocarbon receptor. 1 Introduction TheevolutionofQSARapproachesforchemicaldesignand riskmanagementinvolvesthedevelopmentofnewmethods for quantifying molecular structure, the identification of moremechanisticendpointswithinbiologicalpathways,and more objective approaches for discovering plausible mech- anistic structure-activity relationships. The use of stereo- electronic parameters in quantifying structural variation in heterogeneous datasets and libraries requires a formal treatment of the flexibility of chemicals and the possibility that even a moderately flexible chemical can have many low-energystructureswhicharenotadequatelyrepresented by minimum energy conformations. The lowest-energy conformer might have weak interactions with macromole- cules or steric incompatibilities whereas other conforma- tionswithinpermittedenergiesboundariesmayhavestrong interactions[1±5].Forexample,QSARsforbindingaffinity to the aryl hydrocarbon receptor (AhR) using minimum energy conformations have generally failed, whereas QSARs using charge-transfer parameters computed for the most planar conformations were successful [6]. Even in the AhR model, nonetheless, the selection of the most planar conformation was only a reasonable assumption based on knowledge of the receptor which was imposed on the QSAR analysis instead of being derived directly from the data. For more complex QSAR explorations to be successful without a priori assumptions of geometry, a formal mathematical approach is needed to derive models for complex interactions. The COmmon REactivity PAttern (COREPA) formal- ismtreatsthiscomplexQSARexplorationasaclassification task[2,3].Classificationmethodsidentifycriteriawhichwill classify an unknown object into predefined classes using a training set of objects from multiple classes. Probabilistic methods, discriminant analyses, nearest-neighbour classi- fiers, neural networks and decision trees are representative classification techniques. The COREPA formalism uses a Bayesian probabilistic method to identify common struc- tural characteristics among chemicals that elicit similar biological activity, or class; but does so in a context that allowsmanypossibleconformationsofindividualchemicals and the probability distribution of molecular descriptor valuesinsteadofsingleparametervaluesforeachchemical. QSAR Comb. Sci. 2004, 23 DOI: 10.1002/qsar.200330853 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 5 * To receive all correspondence Key words: QSAR, chemical screening, drug design, Bayesian chemistry COREPA-M: A Multi-Dimensional Formulation of COREPA & Combinatorial Science