2012 IEEE International Conference on Granular Computing Multiclass SVM with Ramp Loss for Imbalanced Data Classiication Piyaphol Phoungphol l , Yanqing Zhang l , Yichuan Zha02, and Bismita Srichandan l I Department of Computer Science 2 Department of Mathematics and Statistics Georgia State University Atlanta, GA 30302-3994, USA pphoungpholl@gsu.edu, yzhang@cs.gsu.edu, yichuan@gsu.edu, bsrichandanl@student. gsu.edu Abstract-Class imbalance is a common problem encoun tered in applying machine learning tools to real-world data. It causes most classiiers to perform sub-optimally and yield very poor performance when a dataset is highly imbalance. In this paper, we study a new method of formulating a multi class SVM problem for imbalanced dataset to improve the classiication performance. The proposed method applies cost sensitive approach and ramp loss function to the Crammer & Singer multiclass SVM formulation. Experimental results on multiple VCI datasets show that the proposed solution can eectively cure the problem when the datasets are noisy and highly imbalanced. Keywords-Multiclass classiication; Imbalanced data; Ramp loss; SVM; I. INTRODUCTION Imbalanced data classiication is a very common and often serious problem in several domains, for example, oil spill detection rom satellite images, protein structure categorization based on a primary protein sequence, and de tection of rare, but important cases such as fraud, intrusion, medical conditions. Unfortunately, most of traditional clas siication techniques usually perform sub-optimally when a representation of one class is signiicantly dominated by the others because classiiers tend to be bias towards the majority class. Fig. 1 shows an ideal and bias separate line of support vector machine (SVM) between two imbalanced classes. There are many research works that try to improve traditional techniques or develop new algorithms to solve the class imbalance problem. However, most of those studies are focused only on binary case or two classes. Only a few researches have been done for multiclass imbalance problem that is much more common and complex in the real world application. In this paper, we studied the multiclass imbalanced data problem, and developed new classiication algorithms that can effectively handle the imbalance problem in many domains. A. Preliminaries In classifying m class problem from n training samples X l , X 2 , ... , Xn where X i is a point in feature space Rd with label Y I , Y 2 , ... , Yn E {I, ... , m} , Crammer & Singer [1] suggested a learning model that is in the form of: f(x) = argmax Wr· X rEY 978-1-4673-2311-6/12/$31.00 ©2012 IEEE (1) - . . - .. � . . � + .. . . " - . - . . - .. .. - . . . . . + .. . . . .. .. .. � . . . . . Ideal Line " - . . .. + + _ .. .. .. SVM Figure 1: SVM on imbalanced dataset is biased toward the major class. where W is a m x d matrix and Wr, the rt h row of W, is a coeficient vector of class r. Therefore, the predicted label of X is a class having the highest similarity score. To ind the best coeicients of matrix W in (1), we can formulate the problem as an optimization problem (2) with an objective to minimize the norm of W and the total classiication error where C is a regularization parameter and � i is an error in predicting point i. 1 n m i n - IIWI1 2 + C " � i w 2 � i=l S.t. Vi, Vr - Y i , W yi' X i - Wr . X i + � i Vi, � i B. Poblems > 1 > 0 (2) While the formulation in (2) usually yields a good result in most cases, it may provide a poor result when a training data is highly imbalanced or noisy. 1 ) Imbalance data: the total error will be dominated by errors of major class instances. Thus, a classiier will certainly be biased toward the major class to minimize the total errors, as shown in Fig. 1. 2 ) Noisy data: an error of each point can take values ranging rom 0 to +0, therefore errors of a few bad or noisy points can severely compromise the overall errors which result in classiier's performance deterioration. We then noticed from both cases that the summation of all errors is not suitable for use as objective function of the optimization problem. In the next section, we will