Mining Rules for the Automatic Selection Process of Clustering Methods Applied to Cancer Gene Expression Data Andr´ e C.A. Nascimento 1 , Ricardo B.C. Prudˆ encio 1 , Marcilio C.P. de Souto 2 , and Ivan G. Costa 1 1 Center of Informatics, Federal University of Pernambuco, Recife, Brazil {acan,rbcp,igcf}@cin.ufpe.br 2 Dept. of Informatics and Applied Mathematics, Fed. Univ. of Rio Grande do Norte, Natal, Brazil marcilio@dimap.ufrn.br Abstract. Different algorithms have been proposed in the literature to cluster gene expression data, however there is no single algorithm that can be considered the best one independently on the data. In this work, we applied the concepts of Meta-Learning to relate features of gene expression data sets to the performance of clustering algorithms. In our context, each meta-example represents descriptive features of a gene expression data set and a label indicating the best clustering algorithm when applied to the data. A set of such meta-examples is given as input to a learning technique (the meta-learner) which is responsible to acquire knowledge relating the descriptive features and the best algorithms. In our work, we performed experiments on a case study in which a meta- learner was applied to discriminate among three competing algorithms for clustering gene expression data of cancer. In this case study, a set of meta-examples was generated from the application of the algorithms to 30 different cancer data sets. The knowledge extracted by the meta- learner was useful to understanding the suitability of each clustering algorithm for specific problems. 1 Introduction New biotechnology methodologies, such as microrrays, allow the measurement of the expression of all genes of a cell sample. Medical researchers can use such methodologies to measure the expression of cancer cell samples of several patients with distinct cancer types. With these data, machine learning methods can be applied to perform computational diagnosis, i.e., to classify the type of a cancer cell based only on the gene expression profile. Another analysis of particular interest is the application of clustering to search for cancer tissues sharing similar molecular signatures. As demonstrated in [1] and [2], this kind of analysis does not only allows to distinguish between distinct cancer types, but also it has lead to the discovery of new cancer sub-types. Such gene expression data sets impose C. Alippi et al. (Eds.): ICANN 2009, Part II, LNCS 5769, pp. 20–29, 2009. c Springer-Verlag Berlin Heidelberg 2009