1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 DOI: 10.1002/minf.201700094 R-based Tool for a Pairwise Structure-Activity Relationship Analysis Kyrylo Klimenko* [a] Abstract: The Structure-Activity Relationship analysis is a complex process that can be enhanced by computational techniques. This article describes a simple tool for SAR analysis that has a graphic user interface and a flexible approach towards the input of molecular data. The application allows calculating molecular similarity repre- sented by Tanimoto index & Euclid distance, as well as, determining activity cliffs by means of Structure-Activity Landscape Index. The calculation is performed in a pairwise manner either for the reference compound and other compounds or for all possible pairs in the data set. The results of SAR analysis are visualized using two types of plot. The application capability is demonstrated by the analysis of a set of COX2 inhibitors with respect to Isoxicam. This tool is available online: it includes manual and input file examples. Keywords: Structure-activity relationships · molecular similarity · structure-activity landscape · gWidgets · drug discovery Structure-Activity Relationship (SAR) analysis has been a useful technique for drug discovery, particularly at hit-to- lead stage. [1] According to, [2,3] the SARs can be either continuous (“activity hills”) or discontinuous (“activity cliffs”) based on whether small changes in compounds structure lead to small or dramatic changes in activity. In the presence of gently rolling hills, or continuous SARs, small changes in molecular structure will cause small effects on activity and the ‘biological activity radius’ will be populated by a spectrum of increasingly diverse structures of similar activity. This is in contrast to discontinuous SARs, where small changes in structure have dramatic effects. [3] Both types of relationship give insights about beneficial and detrimental structure modifications of hit compounds. Even though the software for similarity search [4,5] and quantifica- tion of activity cliffs [6,7] exists, it either has commercial restrictions [4,6] or requires an advanced knowledge of computing, [5,7] keeping some medicinal chemists from using it. For instance, R packages, such as “ChemmineR” [8] and “rcdk”, [9] provide various molecular descriptors and data mining techniques for SAR analysis, however they require users to perform calculations via R command line, which may not be convenient for the inexperienced user. Thus, free user-friendly tool was created for pairwise Structure- Activity Relationship analysis. Structure-Activity Relationship Analyser (SARA) is a R- based application built for R version 3.1.3 [10] that can run both on Windows and Linux. It consists of several scripts and a graphic user interface (Figure 1A) created using “gWidgets” and “gWidgetstcltk” R packages. [11,12] This tool allows calculating molecular similarity, Structure-Activity Landscape Index (SALI) and visualising the results of SAR analysis. SARA is flexible with respect to the ways of molecular structure representation. It can either calculate descriptors from chemical structure file in the SMILES format or use TXT file with descriptors pre-calculated by user. The tool computes 22 molecular descriptors (Table S1) that provide basic representation of molecular structure in case the user does not have a priori knowledge of the optimal descriptors for the particular task. The SMILES format is useful since online databases (e. g. ChEMBL, [13] PubChem [14] ) store in- formation on compounds structure as smiles strings allow- ing to start the SAR immediately after data extraction and curation. If the user has a clear hypothesis about possible SAR, then more elaborate descriptors (e. g. PSA, cLogP) can be calculated beforehand and used as a text file input. Descriptors are used to compute molecular similarity by means of Tanimoto coefficient or Euclidean distance. Both parameters are computed between every compound in the data set in form of the matrix. Since this tool is designed to contribute to the SAR investigations around core compounds structure, the analy- sis of structure-activity relationship is carried out in the pairwise manner. There is an option to select reference compound for optimization and compare it against every other compound in the set both in terms of structure and activity. After the data upload, the user may select any molecule from the data set to be the reference one. When molecular similarity between the reference com- pound and the rest of the dataset is known, the calculation of Structure-Activity Landscape Index becomes possible. [a] K. Klimenko Department of molecular structure and chemoinformatics, A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine E-mail: alhimikir@gmail.com Supporting information for this article is available on the WWW under https://doi.org/10.1002/minf.201700094 Application Note www.molinf.com © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2017, 36, 1700094 (1 of 5) 1700094