Representation Learning for Sparse, High Dimensional Multi-Label Classification Ryan Kiros 1 , Axel J. Soto 2 , Evangelos Milios 2 , and Vlado Keselj 2 1 Department of Computing Science, University of Alberta Edmonton, Canada - rkiros@ualberta.ca 2 Faculty of Computer Science, Dalhousie University Halifax, Canada - {soto,eem,vlado}@cs.dal.ca Abstract. In this article we describe the approach we applied for the JRS 2012 Data Mining Competition. The task of the competition was the multi-labelled classification of biomedical documents. Our method is motivated by recent work in the machine learning and computer vision communities that highlights the usefulness of feature learning for classi- fication tasks. Our approach uses orthogonal matching persuit to learn a dictionary from PCA-transformed features. Binary relevance with logis- tic regression is applied to the encoded representations, leading to a fifth place performance in the competition. In order to show the suitability of our approach outside the competition task we also report a state-of-the- art classification performance on the multi-label ASRS dataset. Keywords: Multi-label classification, feature learning, text mining 1 Introduction Different representations for text corpora have been extensively studied, being TF-IDF and Okapi BM25 two of the most common ways of representing data [9]. The choice of document representation has a major incidence in the performance for tasks such as document classification or retrieval. Particularly, multi-labelled document classification is a research problem that received much less attention in the literature than the single-labelled counterpart. Yet multi-labelled classi- fication is in many cases a more natural approach for document classification tasks [13]. The machine learning community, specially in the area of computer vision, has witnessed the importance of learning feature representations as an alterna- tive to manually configuring the best data representation to feed a prediction method. Learned representations coupled with simple classification methods usu- ally tend to have similar classification accuracy and even overcome other more complex classification methods [3, 4, 14]. The JRS 2012 Data Mining Competition represented a great opportunity to benchmark and evaluate our hypothesis of whether a learned representation of the data combined with a standard classification approach could have a com- petitive performance against other approaches for multi-label text classification.