One-class classification as a novel method of ligand-based virtual screening: The case of glycogen synthase kinase 3b inhibitors Pavel V. Karpov a , Dmitry I. Osolodkin a , Igor I. Baskin a,b , Vladimir A. Palyulin a,c,⇑ , Nikolay S. Zefirov a,c a Department of Chemistry, Moscow State University, Leninskie Gory 1/3, Moscow 119991, Russia b Laboratoire d’Infochimie, UMR 7177 CNRS, Université de Strasbourg 4, rue B. Pascal, Strasbourg 67000, France c Institute of Physiologically Active Compounds, RAS, Severny proezd 1, Chernogolovka, Moscow Region 142432, Russia article info Article history: Received 22 August 2011 Revised 13 September 2011 Accepted 14 September 2011 Available online 21 September 2011 Keywords: One-class classification Virtual screening Glycogen synthase kinase inhibitors Neural networks Auto-encoders abstract A virtual screening system based on one-class classification with molecular fingerprints as descriptors is developed and tested on a series of 1226 inhibitors and 209 noninhibitors of glycogen synthase kinase 3b (GSK-3b). The suggested system outperforms the ones based on pharmacophore hypothesis and molec- ular docking in a retrospective study. However, in a prospective study it should not be used as a sole clas- sifier. The system is exceptionally useful for the identification of new scaffolds among the virtual screening results obtained with other methods. Ó 2011 Elsevier Ltd. All rights reserved. The main task of a virtual screening (VS) is to discriminate puta- tive active compounds from inactive ones. The main requirement of major classification methods is to use both classes of active and inactive compounds during the ligand-based model construc- tion, allowing one to find a hyperplane in a feature space that would separate active samples from inactive ones. The problem is that data for the inactive samples should be collected in the same conditions as for the active ones, but usually information on inactive compounds is not available, and researchers are obliged to create decoy datasets by themselves using their own rules (e.g., Refs. 1,2), that leads to several disadvantages. First, certain decoys may be really active. Second, the performance of a model is influ- enced by the quality of data and chemical diversity of the training dataset. Therefore, usage of standard classification procedures is methodologically incorrect when the decoy set is not rigorously defined. The simplest and fastest VS method is the similarity search when a reference molecule with the desired biological activity is selected and all compounds from a database are ranked in ascend- ing order of similarity to the reference molecule. 3 This procedure generally does not require building a model and using the negative samples, but recently it has been shown that the similarity search works only when the structure/activity surface is smooth enough and there are no ‘activity cliffs’. 4 This condition is difficult to meet because such information is unattainable a priori, so the new methods of similarity search have to be proposed. All these problems could be successfully resolved using the one- class classification (OCC) method, 5–7 the main idea of which is to construct the model based on the active compounds exclusively. In this Letter, we demonstrate the application of OCC to the VS of glycogen synthase kinase 3b (GSK-3b) inhibitors (for a recent re- view of this target see Ref. 8). The reconstruction methods of the OCC approach include the auto-encoder neural networks, self-organizing maps (SOM) and principal component analysis (PCA). Auto-encoder (replicator, bot- tleneck or sand-glass) networks 9 are feed forward neural networks which have at least one hidden layer with the number of neurons many times smaller than in the other layers (Fig. 1). This layer Figure 1. Typical scheme of an auto-encoder neural network. 0960-894X/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.bmcl.2011.09.051 ⇑ Corresponding author. Tel.: +7 495 939 39 69; fax: +7 495 939 02 90. E-mail addresses: vap@org.chem.msu.su, vap@qsar.chem.msu.ru (V.A. Palyulin). Bioorganic & Medicinal Chemistry Letters 21 (2011) 6728–6731 Contents lists available at SciVerse ScienceDirect Bioorganic & Medicinal Chemistry Letters journal homepage: www.elsevier.com/locate/bmcl