Original Article
International Journal of Fuzzy Logic and Intelligent Systems
Vol. 20, No. 2, June 2020, pp. 156-167
http://doi.org/10.5391/IJFIS.2020.20.2.156
ISSN(Print) 1598-2645
ISSN(Online) 2093-744X
Automatic Determination of the Number of
Clusters for Semi-Supervised Relational
Fuzzy Clustering
Norah Ibrahim Fantoukh ■ , Mohamed Maher Ben Ismail ■ , and Ouiem Bchir ■
Department of Computer Science, College of Computer and Information Sciences, King Saud University,
Riyadh, Saudi Arabia
Abstract
Semi-supervised clustering relies on both labeled and unlabeled data to steer the clustering
process towards optimal categorization and escape from local minima. In this paper, we pro-
pose a novel fuzzy relational semi-supervised clustering algorithm based on an adaptive local
distance measure (SSRF-CA). The proposed clustering algorithm utilizes side-information
and formulates it as a set of constraints to supervise the learning task. These constraints
are expressed using reward and penalty terms, which are integrated into a novel objective
function. In particular, we formulate the clustering task as an optimization problem through the
minimization of the proposed objective function. Solving this optimization problem provides
the optimal values of different objective function parameters and yields the proposed semi-
supervised clustering algorithm. Along with its ability to perform data clustering and learn the
underlying dissimilarity measure between the data instances, our algorithm determines the
optimal number of clusters in an unsupervised manner. Moreover, the proposed SSRF-CA
is designed to handle relational data. This makes it appropriate for applications where only
pairwise similarity (or dissimilarity) information between data instances is available. In this
paper, we proved the ability of the proposed algorithm to learn the appropriate local distance
measures and the optimal number of clusters while partitioning the data using various syn-
thetic and real-world benchmark datasets that contain varying numbers of clusters with diverse
shapes. The experimental results revealed that the proposed SSRF-CA accomplished the best
performance among other state-of-the-art algorithms and confirmed the outperformance of our
clustering approach.
Keywords: Semi-supervised clustering, Relational data, Fuzzy clustering, Local distance
measure learning, Optimal number of clusters
Received: Feb. 16, 2020
Revised : May 10, 2020
Accepted: May 26, 2020
Correspondence to:
Mohamed Maher Ben Ismail and Ouiem Bchir
(maher.benismail@gmail.com,
ouiem.bchir@gmail.com)
©The Korean Institute of Intelligent Systems
cc This is an Open Access article dis-
tributed under the terms of the Creative
Commons Attribution Non-Commercial Li-
cense (http://creativecommons.org/licenses/
by-nc/3.0/) which permits unrestricted non-
commercial use, distribution, and reproduc-
tion in any medium, provided the original
work is properly cited.
1. Introduction
Clustering is one of the most popular unsupervised learning techniques that are commonly
used in data mining and pattern recognition fields [1, 2]. The resulting categories include
sets of homogeneous patterns [1]. Accordingly, the distances between the data instances that
belong to the same cluster exhibit high similarity to each other compared to those from other
clusters. Clustering can be perceived as a data modeling technique that yields concise data
summarization. Recently, clustering approaches have gained attention because they play a key
| 156