ORIGINAL PAPER Two new atom centered fragment descriptors and scoring function enhance classification of antibacterial activity Durga Datta Kandel & Chandan Raychaudhury & Debnath Pal Received: 8 November 2013 /Accepted: 30 January 2014 /Published online: 25 March 2014 # Springer-Verlag Berlin Heidelberg 2014 Abstract Classification of pharmacologic activity of a chem- ical compound is an essential step in any drug discovery process. We develop two new atom-centered fragment de- scriptors (vertex indices) - one based solely on topological considerations without discriminating atom or bond types, and another based on topological and electronic features. We also assess their usefulness by devising a method to rank and classify molecules with regard to their antibacterial activity. Classification performances of our method are found to be superior compared to two previous studies on large heteroge- neous data sets for hit finding and hit-to-lead studies even though we use much fewer parameters. It is found that for hit finding studies topological features (simple graph) alone pro- vide significant discriminating power, and for hit-to-lead pro- cess small but consistent improvement can be made by addi- tionally including electronic features (colored graph). Our approach is simple, interpretable, and suitable for design of molecules as we do not use any physicochemical properties. The singular use of vertex index as descriptor, novel range based feature extraction, and rigorous statistical validation are the key elements of this study. Keywords Chemical information . Classification . Molecular fragment descriptors . QSAR . Scoring Introduction Deciphering the activity of a chemical compound against a pathogenic organism is an essential step in the drug discovery process. In seemingly infinite chemical space, exhaustive experimentation with all possible molecules to rationally screen a subset is a formidable task. Thus, the use of compu- tational methods to evaluate the pharmacological activity of potential drug molecules is not just useful but inevitable. Well- validated computational methods (in silico) can complement the in vitro and in vivo experiments, thereby reducing the massive cost and effort involved in drug discovery process. This is of special importance for those diseases which are claiming thousands of lives all over the world every year such as tuberculosis and various bacterial diseases [13]. Broadly, the task of computational modeling in drug dis- covery may be divided into two parts: first, to find molecular properties/features which could be responsible for a given biological activity of a molecule; second, to develop algo- rithms or models using mathematical/statistical techniques to rationally use those features in making valuable activity- predictions. In particular, recognizing specific substructures/ features which impart activity to a molecule can not only provide a good starting point for understanding some funda- mental aspects of biological activities of molecules but can guide structural modifications and optimization too in the process of discovering new chemical entities. In general, molecular properties, referred to as molecular descriptors, for the whole molecule [410] are used more widely for working with chemical compounds. It is notewor- thy that molecular topology based indices have been found to be correlated with biological, physical and physicochemical properties of molecules on many occasions [68] incorporat- ing electronic [11], geometric [12] or even empirical factors Electronic supplementary material The online version of this article (doi:10.1007/s00894-014-2164-1) contains supplementary material, which is available to authorized users. D. D. Kandel : C. Raychaudhury : D. Pal (*) Indian Institute of Science, Bangalore, India e-mail: dpal@serc.iisc.ernet.in J Mol Model (2014) 20:2164 DOI 10.1007/s00894-014-2164-1