Journal of Computer-Aided Molecular Design, 17: 13–38, 2003.
KLUWER/ESCOM
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
13
Fast 3D molecular superposition and similarity search in databases of
flexible molecules
Andreas Krämer, Hans W. Horn & Julia E. Rice
IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120
MS received 10 October 2002; accepted in final form 28 February 2003
Key words: molecular similarity, molecular superposition, database search, flexible 3D superposition, virtual
library screening
Summary
We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-
dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into
account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes
a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast
clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid
query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular
features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able
to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database
containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query
processing times of the order of 0.1 seconds per molecule can be achieved on a PC.
Introduction
The virtual screening of compound databases is an
important tool in modern drug design. Traditionally,
two-dimensional or pharmacophore-based methods,
which are very fast but have only limited accuracy,
have been used for this purpose [1, 2, 3, 4]. In order
to make database searches more accurate, the three-
dimensional structure and conformational flexibility of
the molecules have to be taken into account. In this
paper we describe a highly efficient method (fFLASH)
to perform a database search for molecules that are
similar to a given conformation of a reference or
query molecule, based on explicit three-dimensional,
flexible superpositions.
3D molecular superposition methods have been
successfully utilized to determine binding geometries
relative to a reference molecule [5–14]. They play
an important role in 3D-QSAR applications, phar-
macophore elucidation, and receptor modelling, in
situations where structural data of the target protein
is not available. The variety of methodologies used for
molecular superposition has recently been extensively
reviewed by Lemmen et al. [15], and an application
of existing superposition methods to virtual database
screening has been reported in [16].
Of course, the use of molecular superposition to
determine the binding capability of possible ligands
has its limitations. The underlying assumption is that
other ligands will have the same overall binding mode
as the reference molecule. Also, the bound conforma-
tion of the reference molecule has to be known, which
is generally only the case if crystallographic informa-
tion about the corresponding protein-ligand complex
is available. Therefore, in practical applications the
reference molecule should have a non-flexible struc-
ture, or its bound conformation has to be inferred using
other methods, e.g., deduced from simultaneous, flexi-
ble alignments within a set of ligands that are known to
be active [17–22]. The bound conformation of the ref-
erence molecule can also be determined from distance
constraints obtained in NMR (NOE) experiments, a
possibility that has been outlined in [23].