Improving the Performance of Predictive Clustering Tree Algorithm for Hierarchical Multi-Label Classification Purvi Prajapati 1,∗ and Amit Thakkar 2 1 Assistant Professor, 2 Associate Professor Information & Technology Department, Charusat University, Changa, Gujarat, India. e-mail: purviprajapati.it@ecchanga.ac.in Abstract. Multi label classification is a variation of single label classification problem where each instance is associated with more than one class label. The foremost unremarkably used approach to handle multi-label classification problem is to transfer multi-label problem into single label problems, where binary classifier is learned independently for every attainable class labels. However, multi-labeled data generally exhibit relationships between labels, but multi-label classification approach fails to take such relationships under consideration. It’s understood that in this type of classification, labels co-relationship should be maintain. Label co-relationships can be visualized either in tree structure hierarchies or in DAG (Directed Acyclic Graph) structure hierarchies. These hierarchical arrangement of labels maintain the hierarchical constraint that is once an instance belongs to some class that automatically belongs to all its super classes. This paper presents several variations to the induction of decision tree using Predictive Clustering Tree (PCT) algorithm for Hierarchical Multi-label Classification. We implemented and compared proposed algorithm with different variations of existing algorithm on 12 yeast data sets of Funcat (tree structure). The experimental analysis shows that proposed algorithm surpass all alternative variations of Hierarchical Multi-label Classification algorithms. Keywords: Data mining, Classification, Single label classification, Hierarchical multi-label classification, PCT (predictive clustering tree). 1. Introduction Data mining is the process of discovering intelligent knowledge from large amounts of data stored either in databases, data warehouses or any other information repositories. In data mining, classification is one of the well known tasks, which is used to predict the class of an unseen instance as accurately as possible. Most of the work is done on single label classification, in which each instance is associated with single class label. However, there are many classification tasks where each instance can be associated with one or more than one class label. This area is known as multi-label classification. Below figure 1 and 2 shows an example of single label classification and multi-label classification respectively. Basically there are two types for representation of multiple labels: flat structure and hierarchical structure. In flat structure all class labels are arranged at the same level. In hierarchical structure all class labels are organized in hierarchy according to label correlation ships [2]. In Hierarchical Classification, class labels are organized in a hierarchy: an instance that belongs to some class automatically belongs to all its super classes [1]. Hierarchical Multi-label Classification (HMC) problems combine the characteristics of both hierarchical and multi-label classification problems: (1) class labels are organized in a hierarchical structure (e.g. a tree or DAG structure); (2) examples may be associated with more than one class labels [2,12]. Here tree structure is used for hierarchical classification. Applications of hierarchical multi label classification are found in many areas, including text classification, functional genomics, image recognition, face recognition, emotion detection, etc [9,10]. This research paper is organized as follows. Section 2 discussed approaches of hierarchical multi label classification and Multiple prediction tree using PCT algorithm. Section 3 explains proposed algorithm for HMC. Section 4 presents ∗ Corresponding author Elsevier Publications 2014. 579