2012 IEEE International Conference on Granular Computing
Multiclass SVM with Ramp Loss for Imbalanced Data Classiication
Piyaphol Phoungphol
l
, Yanqing Zhang
l
, Yichuan Zha02, and Bismita Srichandan
l
I
Department of Computer Science 2 Department of Mathematics and Statistics
Georgia State University
Atlanta, GA 30302-3994, USA
pphoungpholl@gsu.edu, yzhang@cs.gsu.edu, yichuan@gsu.edu, bsrichandanl@student. gsu.edu
Abstract-Class imbalance is a common problem encoun
tered in applying machine learning tools to real-world data.
It causes most classiiers to perform sub-optimally and yield
very poor performance when a dataset is highly imbalance.
In this paper, we study a new method of formulating a
multi class SVM problem for imbalanced dataset to improve the
classiication performance. The proposed method applies cost
sensitive approach and ramp loss function to the Crammer
& Singer multiclass SVM formulation. Experimental results
on multiple VCI datasets show that the proposed solution can
eectively cure the problem when the datasets are noisy and
highly imbalanced.
Keywords-Multiclass classiication; Imbalanced data; Ramp
loss; SVM;
I. INTRODUCTION
Imbalanced data classiication is a very common and
often serious problem in several domains, for example,
oil spill detection rom satellite images, protein structure
categorization based on a primary protein sequence, and de
tection of rare, but important cases such as fraud, intrusion,
medical conditions. Unfortunately, most of traditional clas
siication techniques usually perform sub-optimally when a
representation of one class is signiicantly dominated by
the others because classiiers tend to be bias towards the
majority class. Fig. 1 shows an ideal and bias separate line
of support vector machine (SVM) between two imbalanced
classes. There are many research works that try to improve
traditional techniques or develop new algorithms to solve the
class imbalance problem. However, most of those studies
are focused only on binary case or two classes. Only a
few researches have been done for multiclass imbalance
problem that is much more common and complex in the real
world application. In this paper, we studied the multiclass
imbalanced data problem, and developed new classiication
algorithms that can effectively handle the imbalance problem
in many domains.
A. Preliminaries
In classifying m class problem from n training samples
X
l
, X
2
, ... , Xn where X
i
is a point in feature space Rd with
label Y
I
, Y
2
, ... , Yn E {I, ... , m} , Crammer & Singer [1]
suggested a learning model that is in the form of:
f(x) = argmax Wr· X
rEY
978-1-4673-2311-6/12/$31.00 ©2012 IEEE
(1)
-
.
.
-
.. �
.
. �
+
..
.
.
"
-
.
-
.
.
-
..
..
- .
.
.
.
.
+
..
.
.
.
..
..
..
�
.
.
.
.
.
Ideal Line
" -
.
.
.. +
+
_
..
..
..
SVM
Figure 1: SVM on imbalanced dataset is biased toward the
major class.
where W is a m x d matrix and Wr, the rt
h
row of W, is a
coeficient vector of class r. Therefore, the predicted label
of X is a class having the highest similarity score. To ind the
best coeicients of matrix W in (1), we can formulate the
problem as an optimization problem (2) with an objective
to minimize the norm of W and the total classiication error
where C is a regularization parameter and �
i
is an error in
predicting point i.
1
n
m
i
n
-
IIWI1
2
+
C
"
�
i
w 2
�
i=l
S.t. Vi, Vr
-
Y
i
, W
yi'
X
i - Wr
.
X
i
+ �
i
Vi, �
i
B. Poblems
> 1
> 0
(2)
While the formulation in (2) usually yields a good result
in most cases, it may provide a poor result when a training
data is highly imbalanced or noisy.
1 ) Imbalance data: the total error will be dominated
by errors of major class instances. Thus, a classiier will
certainly be biased toward the major class to minimize the
total errors, as shown in Fig. 1.
2 ) Noisy data: an error of each point can take values
ranging rom 0 to +0, therefore errors of a few bad or noisy
points can severely compromise the overall errors which
result in classiier's performance deterioration.
We then noticed from both cases that the summation
of all errors is not suitable for use as objective function
of the optimization problem. In the next section, we will