Fuzzy Sets and Systems 157 (2006) 2347 – 2355
www.elsevier.com/locate/fss
Pseudometrics from three-positive semidefinite similarities
M. Santos Tomás
a , ∗
, Claudi Alsina
a
, Jaime Rubio-Martinez
b
a
Sec. Matemàtiques ETSAB, Universitat Politècnica de Catalunya (UPC),Av. Diagonal 649, E-08028 Barcelona, Spain
b
Departament de Química Física, Universitat de Barcelona (UB), Martí i Franqués 1, E-08028 Barcelona, Spain
Received 5 September 2005; received in revised form 27 December 2005; accepted 28 February 2006
Available online 3 April 2006
Abstract
We prove that when some transformations are applied to three-positive semidefinite similarities we obtain a pseudometric. In
addition, we demonstrate that some similarity coefficients usually employed in diversity studies fulfil this condition.
© 2006 Elsevier B.V.All rights reserved.
Keywords: Pseudometric; Metric; Dissimilarity; Similarity; Tanimoto; Dice; Cosinus
1. Introduction
1.1. The identification of homogeneous subgroups from a collection of heterogeneous objects is one of the most
common tasks in computing. One of the principal reasons of the growing interest of these methods is their use in
combinatorial chemistry for the design of large libraries of compounds in order to find new compounds with drug
properties. As the size of those libraries is usually unmanageable, it is necessary to do a selection including the greater
quantity of diversity without redundancy. It is generally believed that starting with libraries with a diverse set of
compounds offers the best chance of finding active compounds, for this reason, it is necessary to quantify the degree
of resemblance between all possible pairs, in order to find those that are more similar. To achieve this goal, the use of
a similarity measure is necessary.
A molecular similarity measure involve at least two principal components: (1) the representation, used to characterize
the molecules that will be compared, and (2) the similarity coefficient, used as a quantitative measure of the degree of
resemblance between pairs of such representations [9].
Compounds belonging to a chemical library or, in general, objects in a group G, can be described by n attributes or
descriptors in such a way that a vector
X
i
={x
1i
,x
2i
,...,x
ni
},
X
i
∈ G, defines the position of each object in this
n-dimensional space. Different sets of descriptors generate different representations of the group. Descriptors may be
of binary nature (i.e. dichotomous) or real numbers describing different properties of the objects.
∗
Corresponding author. Tel.: +34 93 4016373; fax: +34 93 4016372.
E-mail address: maria.santos.tomas@upc.edu (M.S. Tomás).
0165-0114/$ - see front matter © 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.fss.2006.02.009