B.K. Panigrahi et al. (Eds.): SEMCCO 2012, LNCS 7677, pp. 143–150, 2012.
© Springer-Verlag Berlin Heidelberg 2012
Clustering Algorithm Recommendation:
A Meta-learning Approach
Daniel G. Ferrari and Leandro Nunes de Castro
Natural Computing Laboratory (LCoN)
Mackenzie Presbyterian University,
São Paulo, Brazil
ferrari.dg@gmail.com,
lnunes@mackenzie.br
Abstract. Meta-learning is a technique that aims at understanding what types of
algorithms solve what kinds of problems. Clustering, by contrast, divides a
dataset into groups based on the objects’ similarities without the need of
previous knowledge about the objects’ labels. The present paper proposes the
use of meta-learning to recommend clustering algorithms based on the feature
extraction of unlabelled objects. The features of the clustering problems will be
evaluated along with the ranking of different algorithms so that the meta-
learning system can recommend accurately the best algorithms for a new
problem.
Keywords: clustering, algorithm recommendation, ranking, meta-learning.
1 Introduction
There is currently a huge amount of information represented and stored as data to
posterior analysis [1]. Researchers began to dedicate themselves to the development
of methods to extract knowledge from data; the process of applying these methods is
known as data mining [2]. Nowadays, data mining tools are characterized by a variety
of algorithms able to solve each one of the many data mining tasks. However, this
process suffers from the lack of guidelines to select the best algorithm to solve a given
data mining problem [3].
The meta-learning field has as objective to find which problem features contribute
to a better or worse performance of an algorithm [4], and, from this, recommend the
most appropriate algorithm for solving a given problem [3]. To reach this objective,
meta-learning builds two key sets: (1) Meta-attributes: the set of features that is
common to several instances of a class of problems, such as the number of objects
and the number of binary attributes, among others; (2) Ranking: the set with rank
positions, based on a performance evaluation, of several algorithms applied to the
same problems. From these two sets it is created a model to recommend the ranking
of the algorithms when applied to other problems, not used for training, based on the
meta-attributes proposed.