Learning Similarity Metric to improve the performance of Lazy Multi-label Ranking Algorithms Oscar Reyes Computer Sciences Department University of Holgu´ın Holgu´ın, Cuba oreyesp@facinf.uho.edu.cu Carlos Morell Computer Sciences Department Universidad Central de Las Villas Santa Clara, Cuba cmorellp@uclv.edu.cu Sebasti´ an Ventura Department of Computer Science and Numerical Analysis University of C´ ordoba C´ ordoba, Spain sventura@uco.es Abstract—The definition of similarity metrics is one of the most important tasks in the development of nearest neighbours and instance based learning methods. Furthermore, the perfor- mance of lazy algorithms can be significantly improved with the use of an appropriate weight vector. In the last years, the learn- ing from multi-label data has attracted significant attention from a lot of researchers, motivated from an increasing number of modern applications that contain this type of data. This paper presents a new method for feature weighting, defining a similarity metric as heuristic to estimate the feature weights, and improving the performance of lazy multi-label ranking algorithms. The experimental stage shows the effectiveness of our proposal. Keywords-multi-label ranking; lazy learning algorithms; fea- ture weighting; similarity metric I. I NTRODUCTION Many researchers in the field of supervised learning deal with the analysis of single-label data, where training examples are associated with a single label from a set of disjoint labels L. For single-label data there exist two problems, binary classification when |L| =2 and multi-class classification when |L| > 2. However, current data in several application domains are often associated with a set of labels Y ⊆ L, known as multi-label. There are two major tasks in supervised learning from multi-label data: Multi-Label Classification (MLC) and Label Ranking (LR). In MLC, the set of labels is divided in relevant and irrelevant based on a query instance. The binary and multi-class classification can be considered as specific cases of MLC, the generality of multi-label problems inevitably makes the process more difficult to learn [1]. On the other hand, LR is related to learning a model that orders the class labels according to their relevance to a query instance. MLC and LR are important in mining multi-label data and the generalization of these two problems has been called multi-label ranking (MLR) [2]. An increasing number of modern applications contains multi-label data, such as text categorization [3], emotions evoked by music [4], semantic annotation of images [5] and videos [6], classification of protein function and gene [7]. Several methods have been proposed to MLC. However, not everyone can deal with LR problems. These methods can be grouped into two categories: problem transformation and algorithm adaptation. The former are algorithm independent, they transform the learning task into one or more single-label classification tasks, while the latter extend specific learning algorithms in order to handle multi-label data directly [8]. The methods Copy, Copy-Weight, Select-Max, Select- Min, Select-Random, Ignore, Label Power Set (LPS) and Pruned Problem Transformation method (PPT) are examples of problem transformation methods that convert a multi- label data set into a single-label data set. On the other hand, Binary Relevance (BR), Ranking by Pairwise Comparison (RPC), Multi-label Pairwise Perceptron (MLPP, instantiation of RPC using perceptrons for the binary classification tasks), Calibrate Label Ranking (CLR) and Random k-labelsets (RAkEL) transforms a multi-label data set into more than one single-label data sets [9]. In the second group are INSDIF [10] and Multi-Class Multi-Label Perceptron (MMP) [11]. SVM and C4.5 were adapted in [12] and [13] respectively. Extensions of Ad- aBoost (AdaBoost.MH and AdaBoost.MR) appeared in [14] and Back-Propagation algorithm (BP-MLL) to MLC in [7]. A gene expression programming algorithm for MLC was proposed in [15]. Also, lazy learning algorithms have been developed, such as MLkNN [1], BRkNN [16] and IBLR-ML [17]. Lazy learning algorithms are based on the notion of similarity or distance in the feature space that describes the data. The nearest neighbours and instance based learning are examples of similarity based methods. As far as similarity based methods is concerned, it is desirable to find the most similar instances to the query instance, so that the inference process minimize the amount of incorrect predicted labels [18]. The first step in any lazy learning algorithms to MLC is the same as in kNN to single-label data, retrieving the k nearest examples to a query instance. What differentiates them is the aggregation of the labels sets of these examples. MLkNN uses the maximum a posteriori principle in order to determine the labels set of the query instance, based on prior 246 978-1-4673-5119-5/12/$31.00 c 2012 IEEE