Knowledge and Information Systems
https://doi.org/10.1007/s10115-019-01384-9
REGULAR PAPER
Two approaches for clustering algorithms with
relational-based data
João C. Xavier-Junior
1
· Anne M. P. Canuto
2
· Luiz M. G. Gonçalves
3
Received: 6 December 2015 / Revised: 6 July 2019 / Accepted: 11 July 2019
© Springer-Verlag London Ltd., part of Springer Nature 2019
Abstract
It is well known that relational databases still play an important role for many companies
around the world. For this reason, the use of data mining methods to discover knowledge
in large relational databases has become an interesting research issue. In the context of
unsupervised data mining, for instance, the conventional clustering algorithms cannot handle
the particularities of the relational databases in an efficient way. There are some clustering
algorithms for relational datasets proposed in the literature. However, most of these methods
apply complex and/or specific procedures to handle the relational nature of data, or the
relational-based methods do not capture the relational nature in an efficient way. Aiming
to contribute to this important topic, in this paper, we will present two simple and generic
approaches to handle relational-based data for clustering algorithms. One of them treats
the relational data through the use of a hierarchical structure, while the second approach
applies a weight structure based on relationship and attribute information. In presenting
these two approaches, we aim to tackle relational-based dataset in a simple and efficient way,
improving the efficiency of corporations that handle relational-based in the unsupervised
data mining context. In order to evaluate the effectiveness of the presented approaches, a
comparative analysis will be conducted, comparing the proposed approaches with some
existing approaches and with a baseline approach. In all analyzed approaches, we will use
two well-known types of clustering algorithms (agglomerative hierarchical and K -means).
In order to perform this analysis, we will use two internal and one external clusters as validity
measures.
Keywords Relational database · Relational data clustering approach ·
Cluster validity measures
B João C. Xavier-Junior
jcxavier@imd.ufrn.br
Anne M. P. Canuto
anne@dimap.ufrn.br
Luiz M. G. Gonçalves
lmarcos@dca.ufrn.br
1
Digital Metropolis Institute, Federal University of RN, Natal, RN, Brazil
2
Informatics and Applied Mathematics Department, Federal University of RN, Natal, RN, Brazil
3
Computing and Automation Engineering Department, Federal University of RN, Natal, RN, Brazil
123