A Fast Two-Stage Classification Method of Support Vector Machines Jin Chen, Cheng Wang, Member, IEEE, and Runsheng Wang ATR Laboratory, School of Electronic Science and Engineering National University of Defense Technology 47 Yanwachi, Changsha 410073, China chenjin_wonder@hotmail.com Abstract-Classification of high-dimensional data generally requires enormous processing time. In this paper, we present a fast two-stage method of support vector machines, which includes a feature reduction algorithm and a fast multiclass method. First, principal component analysis is applied to the data for feature reduction and decorrelation, and then a feature selection method is used to further reduce feature dimensionality. The criterion based on Bhattacharyya distance is revised to get rid of influence of some binary problems with large distance. Moreover, a simple method is proposed to reduce the processing time of multiclass problems, where one binary SVM with the fewest support vectors (SVs) will be selected iteratively to exclude the less similar class until the final result is obtained. Experimented with the hyperspectral data 92AV3C, the results demonstrate that the proposed method can achieve a much faster classification and preserve the high classification accuracy of SVMs. I. INTRODUCTION Pattern classification is important due to emerging applications such as hyperspectral classification, protein classification, speech recognition, and so on. Compared to traditional classification approaches, support vector machines (SVMs) have been found to be particularly promising because of its lower sensitivity to the curse of dimensionality [1]. The high generalization ability of SVMs is ensured by special properties of the optimal hyperplane that maximizes the distance to training examples in a high dimensional feature space [2]. Another important property is their good generalization capability supported by their sparse representation of the decision function. However, in many applications, data are represented by high dimensional feature vectors and a large number of classes. Both situations increase the computational complexity of test phase of SVMs. As a result, it seems that, for such classification problems, SVMs may not be comparable to traditional classifiers, such as maximal likelihood classification (MLC) method, in terms of test time. In the literature, dimensionality reduction is motivated mainly by the consideration of classification speed [3]. Dimensionality reduction mainly consists of feature selection and feature extraction approaches. Feature selection methods can be further classified into two categories: filter and wrapper methods [4]. The filter method employs intrinsic properties of data such as Mahalanobis class separability measure as the criterion, while the wrapper method evaluates feature subsets based on the performance of the classifier such as classification error rate. Feature extraction methods mainly including principal component analysis (PCA), independent component analysis (ICA), and kernel principal component analysis (KPCA), a comparison of these methods for dimensionality reduction in SVMs can be see in [5]. SVMs were originally designed for binary classification. One-against-all (OAA) [6] and one-against-one (OAO) [7] [8] are the two most common methods to address the multiclass classification problem. The discrimination of OAA between an information class and all others often leads to the estimation of complex discriminant functions [9]. OAO needs C(C-1)/2 binary SVMs for one classification, which may result in slow classification. To obtain a faster classification, direct acyclic graph SVM (DAGSVM) [10] and binary tree of SVM (BTS) [11] were proposed to reduce the number of binary SVMs of OAO. DAGSVM only needs C-1 binary SVMs, and BTS needs 4/3 log (( 3) / 4) C + binary SVMs on average for one classification. There are also other multiclass SVM methods, which try to achieve higher classification accuracy, such as pairwise decision tree of SVM (PDTSVM) [12] and error correcting output codes (ECOC) methods [13]-[15]. PDTSVM selects binary SVMs with larger geometric margin and reduces the layers to decrease the accumulated errors, while ECOC methods use the error correcting coding theory to improve the decision accuracy. Besides, reduced set methods [16], which try to approximate the original solution by a much smaller number of newly constructed support vectors (SVs), were also proposed to obtain a fast classification of SVMs. In this paper, we propose a fast two-stage method for classification with SVMs, depicted in Fig. 1. First, it is carried out by a feature selection algorithm after decorrelation with principal component analysis (PCA). For the feature selection, we revised the criterion based on Bhattacharyya distance to get rid of influence of some binary problems with large distance. In order to further reduce the computation complexity, a simple method called fast OAO (FOAO) is proposed to combine C-1 binary SVMs with the fewest support vectors. Experimented on an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data set, the results demonstrate that the proposed method can be much faster than different multiclass SVM 978-1-4244-2184-8/08/$25.00 © 2008 IEEE. 869 Proceedings of the 2008 IEEE International Conference on Information and Automation June 20 -23, 2008, Zhangjiajie, China