B.K. Panigrahi et al. (Eds.): SEMCCO 2012, LNCS 7677, pp. 143–150, 2012. © Springer-Verlag Berlin Heidelberg 2012 Clustering Algorithm Recommendation: A Meta-learning Approach Daniel G. Ferrari and Leandro Nunes de Castro Natural Computing Laboratory (LCoN) Mackenzie Presbyterian University, São Paulo, Brazil ferrari.dg@gmail.com, lnunes@mackenzie.br Abstract. Meta-learning is a technique that aims at understanding what types of algorithms solve what kinds of problems. Clustering, by contrast, divides a dataset into groups based on the objects’ similarities without the need of previous knowledge about the objects’ labels. The present paper proposes the use of meta-learning to recommend clustering algorithms based on the feature extraction of unlabelled objects. The features of the clustering problems will be evaluated along with the ranking of different algorithms so that the meta- learning system can recommend accurately the best algorithms for a new problem. Keywords: clustering, algorithm recommendation, ranking, meta-learning. 1 Introduction There is currently a huge amount of information represented and stored as data to posterior analysis [1]. Researchers began to dedicate themselves to the development of methods to extract knowledge from data; the process of applying these methods is known as data mining [2]. Nowadays, data mining tools are characterized by a variety of algorithms able to solve each one of the many data mining tasks. However, this process suffers from the lack of guidelines to select the best algorithm to solve a given data mining problem [3]. The meta-learning field has as objective to find which problem features contribute to a better or worse performance of an algorithm [4], and, from this, recommend the most appropriate algorithm for solving a given problem [3]. To reach this objective, meta-learning builds two key sets: (1) Meta-attributes: the set of features that is common to several instances of a class of problems, such as the number of objects and the number of binary attributes, among others; (2) Ranking: the set with rank positions, based on a performance evaluation, of several algorithms applied to the same problems. From these two sets it is created a model to recommend the ranking of the algorithms when applied to other problems, not used for training, based on the meta-attributes proposed.