Multi-label Hierarchical Text Classification using the ACM Taxonomy António Paulo Santos 1 , Fátima Rodrigues 1 1 GECAD – Knowledge Engineering and Decision Support Group, Institute of Engineering – Polytechnic of Porto, Portugal {pgsa, mfc}@isep.ipp.pt Abstract. Many of the works of text classification involve the attribution of each text a single class label from a predefined set of classes, usually small and flat organized (flat classification). However, there are more complex classification problems in which we can assign to each text more than one class (multi-label classification), that can be organized in a hierarchical structure (hierarchical classification) to support thematic searches by browsing topics of interests. In this paper, a problem of multi-label hierarchical text classification is presented. The experiment involves the creation of a multi-label hierarchical text collection, its pre-processing, followed by the application of different classifiers to the collection, and finally, the evaluation of the classifiers performance. Keywords: multi-label hierarchical text classification; ACM taxonomy 1 Introduction The classification of texts consists on the allocation of one or more previously existing categories to text documents, based on their content. More formally, considering a set of categories C = (C 1 , C 2 ,…,C |C| ) and a set of classified documents D = (d 1 , d 2 , ..., d |D| ), using a method or algorithm for learning, the intention is to build a classifier or a classification function which maps the documents into categories. The classifier is then used to classify new documents, not yet rated. The multi-label hierarchical classification of documents is based on the task of assigning any number of classes, which are organized in a hierarchical structure, to text documents. In the literature there are many contributions about multi-label classification and also many about hierarchical classification. However, if we focus on the combination of these two problems, we find only a few contributions, based in AI techniques, with some limitations. Multi-label classification methods have been categorized into two different groups [16]: problem transformation methods and algorithm adaptation methods. The methods of the first group are algorithm independent. They transform the multi-label classification task into one or more single-label classification tasks. The methods of the second group extend specific