z-SVM: An SVM for Improved Classification of Imbalanced Data Tasadduq Imam, Kai Ming Ting, and Joarder Kamruzzaman Gippsland School of Information Technology, Monash University, Australia {tasadduq, kaiming, joarder}@infotech.monash.edu.au Abstract. Recent literature has revealed that the decision boundary of a Support Vector Machine (SVM) classifier skews towards the minority class for imbalanced data, resulting in high misclassification rate for mi- nority samples. In this paper, we present a novel strategy for SVM in class imbalanced scenario. In particular, we focus on orienting the trained decision boundary of SVM so that a good margin between the decision boundary and each of the classes is maintained, and also classification performance is improved for imbalanced data. In contrast to existing strategies that introduce additional parameters, the values of which are determined through empirical search involving multiple SVM training, our strategy corrects the skew of the learned SVM model automatically irrespective of the choice of learning parameters without multiple SVM training. We compare our strategy with SVM and SMOTE, a widely ac- cepted strategy for imbalanced data, applied to SVM on five well known imbalanced datasets. Our strategy demonstrates improved classification performance for imbalanced data and is less sensitive to the selection of SVM learning parameters. Keywords: class imbalance, support vector machine, SMOTE, z-SVM. 1 Introduction Support Vector Machine (SVM) classifier [1,2] has found popularity in a wide range of classification tasks due to its improved performance in binary classifi- cation scenario [3,4,5,6]. Given a dataset, SVM aims at finding the discriminat- ing hyperplane that maintains an optimal margin from the boundary examples called support vectors. An SVM, thus, focusses on improving generalization on training data. A number of recent works, however, have highlighted that the orientation of the decision boundary for an SVM trained with imbalanced data, is skewed towards the minority class, and as such, the prediction accuracy of minority class is low compared to that of the majority ones. Strategies like SVM ensemble trained at varying sampling rate [7,8], SVM with different cost [9] and SMOTE (Synthetic Minority Oversampling Technique) [10,11] have, therefore, been investigated to improve the minority classification accuracy for imbalanced data. A concern regarding the use of these strategies in practical applications is the necessity to pre-select a good value of the parameters that are introduced in A. Sattar and B.H. Kang (Eds.): AI 2006, LNAI 4304, pp. 264–273, 2006. c Springer-Verlag Berlin Heidelberg 2006