Classification using Probabilistic Random Forest
Rajhans Gondane
System Science and Automation
Indian Institute of Science
Bangalore, India
rajhans.gondane@gmail.com
V. Susheela Devi
Computer Science and Automation
Indian Institute of Science
Bangalore, India
susheela@csa.iisc.ernet.in
Abstract—The Probabilistic random forest is a classification
model which chooses a subset of features for each random
forest depending on the F-score of the features. In other words,
the probability of a feature being chosen in the feature subset
increases as the F-score of the feature in the dataset. A larger F-
score of feature indicates that feature is more discriminative. The
features are drawn in a stochastic manner and the expectation
is that features with higher F-score will be in the feature subset
chosen. The class label of patterns is obtained by combining the
decisions of all the decision trees by majority voting. Experimental
results reported on a number of benchmark datasets demonstrate
that the proposed probabilistic random forest is able to achieve
better performance, compared to the random forest.
I. I NTRODUCTION
In recent times the data stored in databases is growing
consistently. This growth of data needs some techniques to
transform the data into meaningful information and knowledge.
The decision tree classifier is a supervised learning approach,
which selects a set of reducts from the feature space and
generate decision rules based on known data. In decision
tree the goal is to minimize the number of decision rules
and classify the test data correctly with higher accuracy. An
efficient ID3 algorithm was proposed in 1979, which decides
the reducts based on information gain. The ID3 algorithm leads
to inaccurate decision making when there is noise in the sample
data and if the decision tree is binary then we need a long
sequence of tests [1].
In random forest the concept of bagging is used to generate
diverse ensemble classifiers [2], but the problem with applying
only bagging is that the first splitting node in the decision
tree remains the same [3] (even if we sample the data with
replacement). So the random forest uses bagging as well as
randomly selected inputs at each node to grow each tree [3].
This extra randomness improves the accuracy of random forest
to be as good as Adaboost and sometimes even better [3].
The Roulette wheel selection strategy is based on roulette
wheel mechanism to probabilistically select individuals based
on some measure of their performance [4]. Roulette wheel
selection method is stochastic sampling with replacement
(SSR). So it gives zero bias but potentially unlimited
spread [4]. In this, individuals are mapped one-to-one into
continuous interval of rang [0, 1]. The possibility of an
individual having large segment size in roulette wheel being
selected is larger than the one having smallest segment size [4].
F-score is an efficient feature ranking strategy that does not
depend on the class labels [5]. F-score of every feature is a
fraction of the sum of the discrimination between the sets of
different classes and the sum of the discrimination within each
sets of classes. A larger F-score of any feature indicates that
the feature is more discriminative [5].
The Probabilistic random forest (PRF) proposed by us is
an ensemble learning model which is constructed using the
concept of F-score and roulette wheel selection strategy. In
PRF instead of choosing feature subset at random for every
tree in the random forest, we choose a feature subset based
on roulette wheel selection. In roulette wheel selection, the
more discriminative feature have greater probability of being
selected in the feature subset.
II. BACKGROUND THEORY
A. Decision Tree
The Decision tree classifier is a supervised learning method
that is built from a set of training examples [6]. The Decision
tree uses, recursive top-down partitioning process and divide
and rule approaches for dividing the search space into several
subsets in its construction.
In Decision tree, the best feature is selected as a split point
so that the data in each descendent subset are purer than the
data in the parent superset and finally it will classify into some
classes. Each path from root to the leaf node is known as a
Decision rule and the subset of features involove in the final
decision tree are called Reducts of the decision tree.
The Decision tree (see Fig. 1) consist of splitting nodes
for testing unknown data samples, edges which represent the
outcome of the splitting and leaves which represent the class
labels. Algorithms like ID3 and C4.5 uses Entropy (eq.1) and
a+b≤c+d
a≤b c≤d
a
b
c
d
Yes No
Yes No Yes No
Fig. 1: Decision Tree
2015 IEEE Symposium Series on Computational Intelligence
978-1-4799-7560-0/15 $31.00 © 2015 IEEE
DOI 10.1109/SSCI.2015.35
174