Representation Learning for Sparse, High Dimensional Multi-Label Classiﬁcation Ryan Kiros 1 , Axel J. Soto 2 , Evangelos Milios 2 , and Vlado Keselj 2 1 Department of Computing Science, University of Alberta Edmonton, Canada - rkiros@ualberta.ca 2 Faculty of Computer Science, Dalhousie University Halifax, Canada - {soto,eem,vlado}@cs.dal.ca Abstract. In this article we describe the approach we applied for the JRS 2012 Data Mining Competition. The task of the competition was the multi-labelled classiﬁcation of biomedical documents. Our method is motivated by recent work in the machine learning and computer vision communities that highlights the usefulness of feature learning for classi- ﬁcation tasks. Our approach uses orthogonal matching persuit to learn a dictionary from PCA-transformed features. Binary relevance with logis- tic regression is applied to the encoded representations, leading to a ﬁfth place performance in the competition. In order to show the suitability of our approach outside the competition task we also report a state-of-the- art classiﬁcation performance on the multi-label ASRS dataset. Keywords: Multi-label classiﬁcation, feature learning, text mining 1 Introduction Diﬀerent representations for text corpora have been extensively studied, being TF-IDF and Okapi BM25 two of the most common ways of representing data [9]. The choice of document representation has a major incidence in the performance for tasks such as document classiﬁcation or retrieval. Particularly, multi-labelled document classiﬁcation is a research problem that received much less attention in the literature than the single-labelled counterpart. Yet multi-labelled classi- ﬁcation is in many cases a more natural approach for document classiﬁcation tasks [13]. The machine learning community, specially in the area of computer vision, has witnessed the importance of learning feature representations as an alterna- tive to manually conﬁguring the best data representation to feed a prediction method. Learned representations coupled with simple classiﬁcation methods usu- ally tend to have similar classiﬁcation accuracy and even overcome other more complex classiﬁcation methods [3, 4, 14]. The JRS 2012 Data Mining Competition represented a great opportunity to benchmark and evaluate our hypothesis of whether a learned representation of the data combined with a standard classiﬁcation approach could have a com- petitive performance against other approaches for multi-label text classiﬁcation.