Journal of Computer-Aided Molecular Design, 17: 13–38, 2003. KLUWER/ESCOM © 2003 Kluwer Academic Publishers. Printed in the Netherlands. 13 Fast 3D molecular superposition and similarity search in databases of flexible molecules Andreas Krämer, Hans W. Horn & Julia E. Rice IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120 MS received 10 October 2002; accepted in final form 28 February 2003 Key words: molecular similarity, molecular superposition, database search, flexible 3D superposition, virtual library screening Summary We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three- dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC. Introduction The virtual screening of compound databases is an important tool in modern drug design. Traditionally, two-dimensional or pharmacophore-based methods, which are very fast but have only limited accuracy, have been used for this purpose [1, 2, 3, 4]. In order to make database searches more accurate, the three- dimensional structure and conformational flexibility of the molecules have to be taken into account. In this paper we describe a highly efficient method (fFLASH) to perform a database search for molecules that are similar to a given conformation of a reference or query molecule, based on explicit three-dimensional, flexible superpositions. 3D molecular superposition methods have been successfully utilized to determine binding geometries relative to a reference molecule [5–14]. They play an important role in 3D-QSAR applications, phar- macophore elucidation, and receptor modelling, in situations where structural data of the target protein is not available. The variety of methodologies used for molecular superposition has recently been extensively reviewed by Lemmen et al. [15], and an application of existing superposition methods to virtual database screening has been reported in [16]. Of course, the use of molecular superposition to determine the binding capability of possible ligands has its limitations. The underlying assumption is that other ligands will have the same overall binding mode as the reference molecule. Also, the bound conforma- tion of the reference molecule has to be known, which is generally only the case if crystallographic informa- tion about the corresponding protein-ligand complex is available. Therefore, in practical applications the reference molecule should have a non-flexible struc- ture, or its bound conformation has to be inferred using other methods, e.g., deduced from simultaneous, flexi- ble alignments within a set of ligands that are known to be active [17–22]. The bound conformation of the ref- erence molecule can also be determined from distance constraints obtained in NMR (NOE) experiments, a possibility that has been outlined in [23].