Multi-label learning based on iterative label propagation over graph q Bin Fu a,b , Zhihai Wang a, , Guandong Xu b , Longbing Cao b a School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China b Advanced Analytics Institute, University of Technology Sydney, Sydney, Australia article info Article history: Received 18 August 2013 Available online 17 January 2014 Keywords: Multi-label learning Label dependency Random walk with restart Label propagation abstract One key challenge in multi-label learning is how to exploit label dependency effectively, and existing methods mainly address this issue via training a prediction model for each label based on the combina- tion of original features and the labels on which it depends on. However, the influence of label depen- dency might be depressed due to the significant imbalance in dimensionality of feature set and dependent label set in this way, also the dynamic interaction between labels cannot be utilized effec- tively. In this paper, we propose a new framework to exploit the dependencies between labels iteratively and interactively. Every label’s prediction will be updated through iterative process of propagation, other than being determined directly by a prediction model. Specifically, we utilize a graph model to encode the dependencies between labels, and employ the random-walk with restart (RWR) strategy to propagate the dependency among all labels iteratively until the predictions for all the labels converge. We validate our approach by experiments, and the results demonstrate that it yields significant improvements com- pared with several state-of-the-art algorithms. Ó 2014 Published by Elsevier B.V. 1. Introduction In traditional supervised learning, an instance is usually as- sumed to be associated with only one class label. For example, whether an email is a spam or not, or a patient would suffer from a heart attack or be healthy, etc. Learning how to predict the label for this type of instances is thus called single-label learning. How- ever, the instances in a variety of real applications could be assigned with multiple labels simultaneously, e.g., a news report may fall into both of technology and entertainment categories in document analysis; a film could be tagged with thriller, action and crime in film recommendation. Accordingly, learning models that could predict multiple labels simultaneously for an instance is called multi-label learning. Nowadays, multi-label learning has attracted increasing attention with its ubiquitous applications, including text categori- zation [16], gene function’s prediction [3], music emotion analysis [18], and personal recommendation [17], etc. Usually, we use a set of features to characterize an instance from different aspects, thus an instance with multiple labels can be represented by a vector < x 1 ; x 2 ; ... ; x d ; y 1 ; y 2 ; ... ; y m >, where x 1...d is its features, y 1...m is its labels. Thus, the multi-label learning process essentially is to learn a function: f ðX 1 ; X 2 ; ... ; X d Þ! ðY 1 ; Y 2 ; ... ; Y m Þ, which takes an instance’ features as input and pre- dicts a possible set of labels for it. The paradigm of multi-label learning primarily consists of two key phases, i.e., (1) training the predictive model based on a training set consists of labeled in- stances, and (2) predicting labels for unknown instances using the model. Nowadays, various models comply with this paradigm have been proposed, e.g., binary relevance (BR), Label powerset (LP) [19], random k-labelsets [20], etc. However, there are still some potential issues that need further exploration, of which a cru- cial one is how to exploit the dependency between labels to learn more effective models. Roughly speaking, label dependency refers to the prediction of a label may be depend on other labels besides features’ values. An toy example of relationship between features and labels is shown in Fig. 1, which indicates that, for example, la- bel Y 1 is not only determined by the features, but also by label Y 2 . In practice, labels might be positively or oppositely dependent on each other, e.g., a picture of sea might also be an outdoor picture, whereas a political report could not be a entertainment report etc. Obviously, exploiting such kind of information would result in more accurate models. A multitude of methods with this motiva- tion have been proposed, such as classifier chains [14], IBLR-ML+ [2], LEAD [23], and others summarized in [5] based on the types of label dependency that have been exploited. However, most of existing methods exploit label dependency via training a predictive model for each label based on the http://dx.doi.org/10.1016/j.patrec.2014.01.001 0167-8655/Ó 2014 Published by Elsevier B.V. q This paper has been recommended for acceptance by A. Marcelli. Corresponding author. E-mail addresses: Bin.Fu@student.uts.edu.au (B. Fu), zhhwang@bjtu.edu.cn (Z. Wang), guandong.xu@uts.edu.au (G. Xu), Longbing.Cao@uts.edu.au (L. Cao). Pattern Recognition Letters 42 (2014) 85–90 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec