AbstractHierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Multi-label Hierarchical Classification using a Competitive Neural Network (MHC-CNN). It was tested in ten datasets the Gene Ontology (GO) Cellular Component Domain. The results are compared with the Clus-HMC and Clus-HSC using the hF-Measure. KeywordsHierarchical Classification, Competitive Neural Network, Global Classifier. I. INTRODUCTION IERARCHICAL classification is a task of data mining that has been applied in diverse areas such as the music prediction [28], [29], [4], images [30], text (work place) among others. In bioinformatics, it has been used for functional prediction of proteins, since this is not an easy task to accomplish without the help of efficient techniques. The prediction of protein functions can be treated as a classification problem in data mining, in which proteins attributes are considered a sample in the database and its biological functions as classes (multi-class classifiers) [2]. Most algorithms for multi-label hierarchical classification of proteins have been developed to support class hierarchies with a tree structure, but the use of ontology in predicting protein functions has been used as in the case of Gene Ontology (GO) [13], [16], [20]. The GO terms are hierarchies structured as a directed acyclic graph (DAG), in which a "child" term may be connected to one or more "parents" terms. The classification algorithms developed to support this type of structure typically do not assess the hierarchical model as a whole (global or big-bang approach), which may change the predictive results of the samples. In this paper an algorithm for hierarchical classification of data for structures such as DAG, developed by Borges and Nievola [26] denominated of MHC-CNN (Multi-label Hierarchical Classification using a Competitive Neural Network) is applied. The experiments are focuses on hierarchical protein function prediction using GO Cellular Component Domain as the aim to verify the comportment of the classifier. Helyane B. Borges is with the Universidade Tecnológica Federal do Paraná. Brasil (e-mail: helyane@utfpr.edu.br). Julio Cesar Nievola is with the Pontifícia Universidade Católica do Paraná. Brasil (e-mail: nievola@ppgia.pucpr.br). II. HIERARCHICAL CLASSIFICATION Hierarchical classification differs from flat classification because the classes are organized in a hierarchy structured as a tree or a DAG where the nodes of this hierarchy represent the classes that are involved in the classification process. The main difference between the tree structure and the DAG structure is that in the tree structure each node (each class), except the root node, has only one ancestor (parent), while in the DAG structure each node (class) can have one or more ancestors nodes. Another characteristic that differs flat classification from hierarchical classification refers to the type of prediction of classes in the hierarchy, which can be distinguished into two categories: mandatory leaf node (possible in flat or hierarchical classification) prediction and non-mandatory leaf node (possible only in hierarchical classification). In mandatory leaf node prediction all examples should be associated with classes represented by leaf nodes. In the non- mandatory leaf node prediction there is no requirement that the prediction occurs at leaf nodes. Thus, the examples may be associated with classes that are represented by any internal node of the class hierarchy along with their ancestors. To explore hierarchical classification problems some solutions have been proposed, which can be divided into three main approaches: flat hierarchical classification, local hierarchical classification and global hierarchical classification [4]. These approaches describe how the classifiers are built and not a classification method, such as top-down approach that is often cited in literature as being one of the approaches. A. Flat Hierarchical Classification The flat hierarchical classification has the same behavior of a conventional classification algorithm in the training and testing phases. This approach considers that a hierarchical classification problem can be transformed into a flat classification problem disregarding the concept of ancestor and descendant, i.e., it ignores the class hierarchy, predicting only the leaf nodes. This approach is similar to conventional flat classification and can be applied to tree and DAG structures. B. Local Hierarchical Classification The local hierarchical classification consists of using M independent local classifiers, each one dealing with the prediction of only one of the classes (M is the total number of nodes in the class hierarchy) [13]. Hence, the number of classifiers that should be trained could be huge in situations where there are a lot of classes. Helyane B. Borges and Julio Cesar Nievola Multi-Label Hierarchical Classification for Protein Function Prediction H World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:7, No:8, 2013 1081 International Scholarly and Scientific Research & Innovation 7(8) 2013 scholar.waset.org/1307-6892/16089 International Science Index, Computer and Information Engineering Vol:7, No:8, 2013 waset.org/Publication/16089