Focus Article Mining flexible-receptor molecular docking data Karina S. Machado, 1 Ana T. Winck, 2 Duncan D. Ruiz 2 and Osmar Norberto de Souza 1 Knowledge discovery in databases has become an integral part of practically ev- ery aspect of bioinformatics research, which usually produces, and has to process, very large amounts of data. Rational drug design is one of the current scientific areas that has greatly benefited from bioinformatics, particularly a step, which analyzes receptor–ligand interactions via molecular docking simulations. An im- portant challenge is the inclusion of the receptor flexibility since they can become computationally very demanding. We have represented this explicit flexibility as a series of different conformations derived from a molecular dynamics simu- lation trajectory of the receptor. This model has been termed as the fully flexi- ble receptor (FFR) model. In our studies, the receptor is the enzyme InhA from Mycobacterium tuberculosis, which is the major drug target for the treatment of tuberculosis. The FFR model of InhA (named FFR InhA) was docked to four lig- ands, namely, nicotinamide adenine dinucleotide, pentacyano(isoniazid)ferrate II, triclosan, and ethionamide, thus, generating very large amounts of data, which needs to be mined to produce useful knowledge to help accelerate drug discov- ery and development. Very little work has been done in this area. In this article, we review our work on the application of classification decision trees, regres- sion model tree, and association rules using properly preprocessed data of the FFR molecular docking results, and show how they can provide an improved un- derstanding of the FFR InhA-ligand behavior. Furthermore, we explain how data mining techniques can support the acceleration of molecular docking simulations of FFR models. C 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 532–541 DOI: 10.1002/widm.46 INTRODUCTION B ioinformatics can be defined as the science of managing, mining, integrating, and interpreting information from biological data present at different levels. 1 Currently, with the substantial growth of bi- ological data, the process of knowledge discovery in databases, 2 of which data mining is an integral part, plays a significant role in solving the problems of an- alyzing this massive amount of data. One important research area in bioinformatics is the rational drug design (RDD) 3 since the costs in- volved in the development of new drugs has reached Correspondence to: karina.machado@pucrs.br Current address: Universidade Federal do Rio Grande, Centro de Ciˆ encias Computacionais–C3, Rio Grande, RS, Brasil 1 Laborat ´ orio de Bioinform ´ atica, Modelagem e Simulac ¸˜ ao de Bios- sistemas, Faculdade de Inform ´ atica, Porto Alegre, RS, Brasil 2 Grupo de Pesquisa em Inteligˆ encia de Neg ´ ocio, Faculdade de In- form ´ atica, Porto Alegre, RS, Brasil DOI: 10.1002/widm.46 over one billion dollars. RDD has become an emer- gent technology for cost reduction and faster delivery of drugs to the market. 4 A detailed understanding of the interactions between drug candidates or lig- ands and target proteins or receptors, through molec- ular docking simulations, is the computational basis of RDD. 5 Given a target protein receptor, molecular docking simulations sample hundreds of thousands of orientations and conformations of a ligand inside the protein binding site, evaluate the free energy of bind- ing (FEB), and rank the orientations/conformations according to their scores. 6 The majority of molec- ular docking methods treat the ligands as flexible, but the receptors are treated as rigid molecules. Con- versely, proteins are inherently flexible systems and this flexibility is frequently essential to determine their functions. 7 Therefore, realistic docking simulations need to take into account the molecular flexibility, for both receptor and ligand, since in many cases the key–lock model does not work well and the induced- fit model is more appropriate. 8 532 Volume 1, November/December 2011 c 2011 John Wiley & Sons, Inc.