Knowl Inf Syst (2017) 50:827–850
DOI 10.1007/s10115-016-0957-5
REGULAR PAPER
DBMUTE: density-based majority under-sampling
technique
Chumphol Bunkhumpornpat
1
·
Krung Sinapiromsaran
2
Received: 16 January 2015 / Revised: 1 February 2016 / Accepted: 13 May 2016 /
Published online: 27 May 2016
© Springer-Verlag London 2016
Abstract Class imbalance is a challenging problem that demonstrates the unsatisfactory
classification performance of a minority class. A trivial classifier is biased toward minority
instances because of their tiny fraction. In this paper, our density function is defined as
the distance along the shortest path between each majority instance and a minority-cluster
pseudo-centroid in an underlying cluster graph. A short path implies highly overlapping dense
minority instances. In contrast, a long path indicates a sparsity of instances. A new under-
sampling algorithm is proposed to eliminate majority instances with low distances because
these instances are insignificant and obscure the classification boundary in the overlapping
region. The results show predictive improvements on a minority class from various classifiers
on different UCI datasets.
Keywords Pattern recognition · Class imbalance · Under-sampling · Density-based
1 Introduction
Classification [18, 32] is a data mining methodology that is applied to predict a class label
of an unidentified instance. To accomplish the objective of classification, a classifier is gen-
erated based on the learning algorithm from known instances, such as a decision tree (C4.5)
[28], a neural network (multilayer perceptron) [18], a rule-based classifier (RIPPER) [10],
a probabilistic classifier (Naive Bayes) [15], a distance-based classifier (k-nearest neighbor)
B Chumphol Bunkhumpornpat
chumphol.b@cmu.ac.th
Krung Sinapiromsaran
Krung.S@chula.ac.th
1
Theoretical and Empirical Research Group, Department of Computer Science, Faculty of Science,
Chiang Mai University, Chiang Mai 50200, Thailand
2
Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University,
Bangkok 10330, Thailand
123