1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
DOI: 10.1002/minf.201700094
R-based Tool for a Pairwise Structure-Activity Relationship
Analysis
Kyrylo Klimenko*
[a]
Abstract: The Structure-Activity Relationship analysis is a
complex process that can be enhanced by computational
techniques. This article describes a simple tool for SAR
analysis that has a graphic user interface and a flexible
approach towards the input of molecular data. The
application allows calculating molecular similarity repre-
sented by Tanimoto index & Euclid distance, as well as,
determining activity cliffs by means of Structure-Activity
Landscape Index. The calculation is performed in a pairwise
manner either for the reference compound and other
compounds or for all possible pairs in the data set. The
results of SAR analysis are visualized using two types of
plot. The application capability is demonstrated by the
analysis of a set of COX2 inhibitors with respect to Isoxicam.
This tool is available online: it includes manual and input
file examples.
Keywords: Structure-activity relationships · molecular similarity · structure-activity landscape · gWidgets · drug discovery
Structure-Activity Relationship (SAR) analysis has been a
useful technique for drug discovery, particularly at hit-to-
lead stage.
[1]
According to,
[2,3]
the SARs can be either
continuous (“activity hills”) or discontinuous (“activity cliffs”)
based on whether small changes in compounds structure
lead to small or dramatic changes in activity. In the presence
of gently rolling hills, or continuous SARs, small changes in
molecular structure will cause small effects on activity and
the ‘biological activity radius’ will be populated by a
spectrum of increasingly diverse structures of similar
activity. This is in contrast to discontinuous SARs, where
small changes in structure have dramatic effects.
[3]
Both
types of relationship give insights about beneficial and
detrimental structure modifications of hit compounds. Even
though the software for similarity search
[4,5]
and quantifica-
tion of activity cliffs
[6,7]
exists, it either has commercial
restrictions
[4,6]
or requires an advanced knowledge of
computing,
[5,7]
keeping some medicinal chemists from using
it. For instance, R packages, such as “ChemmineR”
[8]
and
“rcdk”,
[9]
provide various molecular descriptors and data
mining techniques for SAR analysis, however they require
users to perform calculations via R command line, which
may not be convenient for the inexperienced user. Thus,
free user-friendly tool was created for pairwise Structure-
Activity Relationship analysis.
Structure-Activity Relationship Analyser (SARA) is a R-
based application built for R version 3.1.3
[10]
that can run
both on Windows and Linux. It consists of several scripts
and a graphic user interface (Figure 1A) created using
“gWidgets” and “gWidgetstcltk” R packages.
[11,12]
This tool
allows calculating molecular similarity, Structure-Activity
Landscape Index (SALI) and visualising the results of SAR
analysis.
SARA is flexible with respect to the ways of molecular
structure representation. It can either calculate descriptors
from chemical structure file in the SMILES format or use TXT
file with descriptors pre-calculated by user. The tool
computes 22 molecular descriptors (Table S1) that provide
basic representation of molecular structure in case the user
does not have a priori knowledge of the optimal descriptors
for the particular task. The SMILES format is useful since
online databases (e. g. ChEMBL,
[13]
PubChem
[14]
) store in-
formation on compounds structure as smiles strings allow-
ing to start the SAR immediately after data extraction and
curation. If the user has a clear hypothesis about possible
SAR, then more elaborate descriptors (e. g. PSA, cLogP) can
be calculated beforehand and used as a text file input.
Descriptors are used to compute molecular similarity by
means of Tanimoto coefficient or Euclidean distance. Both
parameters are computed between every compound in the
data set in form of the matrix.
Since this tool is designed to contribute to the SAR
investigations around core compounds structure, the analy-
sis of structure-activity relationship is carried out in the
pairwise manner. There is an option to select reference
compound for optimization and compare it against every
other compound in the set both in terms of structure and
activity. After the data upload, the user may select any
molecule from the data set to be the reference one.
When molecular similarity between the reference com-
pound and the rest of the dataset is known, the calculation
of Structure-Activity Landscape Index becomes possible.
[a] K. Klimenko
Department of molecular structure and chemoinformatics, A.V.
Bogatsky Physico-Chemical Institute of NAS of Ukraine
Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
E-mail: alhimikir@gmail.com
Supporting information for this article is available on the WWW
under https://doi.org/10.1002/minf.201700094
Application Note www.molinf.com
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2017, 36, 1700094 (1 of 5) 1700094