Rule interpreter: a chemical language for structure-based screening Stoyan Karabunarliev a,b, * , Nina Nikolova c , Nikolay Nikolov d , Ovanes Mekenyan a a Laboratory of Mathematical Chemistry, University ‘Assen Zlatarov’, 8010 Bourgas, Bulgaria b Department of Chemistry, University of Houston, Houston, TX 77204-5003, USA c Laboratory of Parallel and Distributed Processing, Bulgarian Academy of Sciences, 1756 Sofia, Bulgaria d Center For Biomedical Engineering, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria Abstract A chemical language for definition and use of logical rules in screening of chemical databases is described. The rules are based on user-defined screens, which combine substructure matching with constraints on molecular descriptors, stereochemical configurations and mutual 3D placements of chemical groups. Screens are written in extended SMILES notation with the option to define variant chemical groups and constraints in a single entry. Rules are Boolean logic expressions comprised of screens and preceding rules. Arbitrary decision trees can be constructed by using nested and conditional statements referring to the rules defined. The language was used in a database-integrated QSAR expert system for aquatic toxicity, which exploits the concept of toxicochemical analogues. Another example of its usage addresses the prediction of androgen receptor binding affinity. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Rule description language; Rule interpreter; Database screening; Substructure search; Stereoelectronic constraints; Stereochemical screens 1. Introduction In the last decades considerable efforts have been invested in methods for preliminary assess- ment of toxicological hazards from chemical structure. Especially challenging is the development of structure-activity relationships (SARs) to screen large data sets of diverse chemical structures for toxicological activity in a technically sound man- ner. Two SAR approaches can be outlined in this field [1–4]. The first one is focused on the toxicodynamics of biological interactions and addresses toxicochemical differentiation of chemicals. It uses pattern recognition techniques to identify these common features in molecular and electronic structure, which result in a similar toxic action. The approach typically operates with toxicophores—the chemical groups responsible for specific mechanisms of action—or their steroelec- tronic images. Once noncongeneric chemicals are toxicochemically differentiated, a second approach could be used to further assess quantitatively the toxic potencies within groups of chemicals with a common mechanism of action. This approach, loosely named correlative SAR, typically accounts for the toxicokinetic factors. Here various molecu- lar descriptors, ranging from measurable properties to quantum-chemical quantities, are used to explain the quantitative variation of a given kind of toxic potency. 0166-1280/03/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S0166-1280(02)00617-6 Journal of Molecular Structure (Theochem) 622 (2003) 53–62 www.elsevier.com/locate/theochem * Corresponding author. Address: Laboratory of Mathematical Chemistry, University ‘Assen Zlatarov’, 8010 Bourgas, Bulgaria. E-mail addresses: karabunarliev@uh.edu (S. Karabunarliev), omekenya@btu.bg (O. Mekenyan).