Improving Support Vector Classification via the Combination of Multiple Sources of Information Javier M. Moguerza 1 , Alberto Mu˜ noz 2 , and Isaac Mart´ ın de Diego 2 1 University Rey Juan Carlos, c/ Tulip´an s/n, 28933 M´ ostoles, Spain j.moguerza@escet.urjc.es 2 University Carlos III de Madrid, c/ Madrid 126, 28903 Getafe, Spain alberto.munoz@uc3m.es, ismdiego@est-econ.uc3m.es Abstract. In this paper we describe several new methods to build a kernel matrix from a collection of kernels. This kernel will be used for classification purposes using Support Vector Machines (SVMs). The key idea is to extend the concept of linear combination of kernels to the concept of functional (matrix) combination of kernels. The functions in- volved in the combination take advantage of class conditional probabili- ties and nearest neighbour techniques. The proposed methods have been successfully evaluated on a variety of real data sets against a battery of powerful classifiers and other kernel combination techniques. 1 Introduction Support Vector Machines (SVMs) have proven to be a successful tool for the solution of a wide range of classification problems since their introduction in [3]. The method uses as a primary source of information a kernel function K(i, j ), where K is Mercer’s kernel and i, j represent data points in the sample: By the Representer Theorem (see for instance [16]), SVM classifiers always take the form f (x)= i α i K(x, i). The approximation and generalization capacity of the SVM is determined by the choice of the kernel K [4]. A common way to obtain SVM kernels is to consider a linear differential operator D, and choose K as Green’s function for the operator D D, where D is the adjoint operator of D [15]. It is easy to show that f 2 = Df 2 L2 . Thus we are imposing smoothing conditions on the solution f . However, it is hard to know in advance which particular smoothing conditions to impose for a given data set. Fortunately, kernels are straightforwardly related to similarity (or equivalently distance) measures, and this information is actually available in many data analysis problems. Nevertheless, using a single kernel may be not enough to solve accurately the problem under consideration. This happens, for instance, when dealing with text mining problems, where analysis results may vary depending on the docu- ment similarity measure chosen [9]. Thus, the information provided by a single similarity measure (kernel) may be not enough for classification purposes, and the combination of kernels appears as an interesting alternative to the choice of the ‘best’ kernel. A. Fred et al. (Eds.): SSPR&SPR 2004, LNCS 3138, pp. 592–600, 2004. c Springer-Verlag Berlin Heidelberg 2004