Classiﬁcation using Probabilistic Random Forest Rajhans Gondane System Science and Automation Indian Institute of Science Bangalore, India rajhans.gondane@gmail.com V. Susheela Devi Computer Science and Automation Indian Institute of Science Bangalore, India susheela@csa.iisc.ernet.in Abstract—The Probabilistic random forest is a classiﬁcation model which chooses a subset of features for each random forest depending on the F-score of the features. In other words, the probability of a feature being chosen in the feature subset increases as the F-score of the feature in the dataset. A larger F- score of feature indicates that feature is more discriminative. The features are drawn in a stochastic manner and the expectation is that features with higher F-score will be in the feature subset chosen. The class label of patterns is obtained by combining the decisions of all the decision trees by majority voting. Experimental results reported on a number of benchmark datasets demonstrate that the proposed probabilistic random forest is able to achieve better performance, compared to the random forest. I. I NTRODUCTION In recent times the data stored in databases is growing consistently. This growth of data needs some techniques to transform the data into meaningful information and knowledge. The decision tree classiﬁer is a supervised learning approach, which selects a set of reducts from the feature space and generate decision rules based on known data. In decision tree the goal is to minimize the number of decision rules and classify the test data correctly with higher accuracy. An efﬁcient ID3 algorithm was proposed in 1979, which decides the reducts based on information gain. The ID3 algorithm leads to inaccurate decision making when there is noise in the sample data and if the decision tree is binary then we need a long sequence of tests [1]. In random forest the concept of bagging is used to generate diverse ensemble classiﬁers [2], but the problem with applying only bagging is that the ﬁrst splitting node in the decision tree remains the same [3] (even if we sample the data with replacement). So the random forest uses bagging as well as randomly selected inputs at each node to grow each tree [3]. This extra randomness improves the accuracy of random forest to be as good as Adaboost and sometimes even better [3]. The Roulette wheel selection strategy is based on roulette wheel mechanism to probabilistically select individuals based on some measure of their performance [4]. Roulette wheel selection method is stochastic sampling with replacement (SSR). So it gives zero bias but potentially unlimited spread [4]. In this, individuals are mapped one-to-one into continuous interval of rang [0, 1]. The possibility of an individual having large segment size in roulette wheel being selected is larger than the one having smallest segment size [4]. F-score is an efﬁcient feature ranking strategy that does not depend on the class labels [5]. F-score of every feature is a fraction of the sum of the discrimination between the sets of different classes and the sum of the discrimination within each sets of classes. A larger F-score of any feature indicates that the feature is more discriminative [5]. The Probabilistic random forest (PRF) proposed by us is an ensemble learning model which is constructed using the concept of F-score and roulette wheel selection strategy. In PRF instead of choosing feature subset at random for every tree in the random forest, we choose a feature subset based on roulette wheel selection. In roulette wheel selection, the more discriminative feature have greater probability of being selected in the feature subset. II. BACKGROUND THEORY A. Decision Tree The Decision tree classiﬁer is a supervised learning method that is built from a set of training examples [6]. The Decision tree uses, recursive top-down partitioning process and divide and rule approaches for dividing the search space into several subsets in its construction. In Decision tree, the best feature is selected as a split point so that the data in each descendent subset are purer than the data in the parent superset and ﬁnally it will classify into some classes. Each path from root to the leaf node is known as a Decision rule and the subset of features involove in the ﬁnal decision tree are called Reducts of the decision tree. The Decision tree (see Fig. 1) consist of splitting nodes for testing unknown data samples, edges which represent the outcome of the splitting and leaves which represent the class labels. Algorithms like ID3 and C4.5 uses Entropy (eq.1) and a+b≤c+d a≤b c≤d a b c d Yes No Yes No Yes No Fig. 1: Decision Tree 2015 IEEE Symposium Series on Computational Intelligence 978-1-4799-7560-0/15 $31.00 © 2015 IEEE DOI 10.1109/SSCI.2015.35 174