Optimising multiple kernels for SVM by Genetic Programming Laura Dios ¸an 1,2 , Alexandrina Rogozan 1 and Jean-Pierre Pecuchet 1 2 LITIS, EA 4108, INSA, Rouen, France 1 Babes ¸-Bolyai University, Cluj-Napoca, Romania lauras@cs.ubbcluj.ro, {arogozan,pecuchet}@insa-rouen.fr Abstract. Kernel-based methods have shown signiﬁcant performances in solv- ing supervised classiﬁcation problems. However, there is no rigorous methodol- ogy capable to learn or to evolve the kernel function together with its parame- ters. In fact, most of the classic kernel-based classiﬁers use only a single kernel, whereas the real-world applications have emphasized the need to consider a com- bination of kernels - also known as a multiple kernel (MK) - in order to boost the classiﬁcation accuracy by adapting better to the characteristics of the data. Our aim is to propose an approach capable to automatically design a complex multiple kernel (CMK) and to optimise its parameters by evolutionary means. In order to achieve this purpose we propose a hybrid model that combines a Ge- netic Programming (GP) algorithm and a kernel-based Support Vector Machine (SVM) classiﬁer. Each GP chromosome is a tree that encodes the mathemati- cal expression of a MK function. Numerical experiments show that the SVM involving our evolved complex multiple kernel (eCMK) perform better than the classical simple kernels. Moreover, on the considered data sets, our eCMK out- perform both a state of the art convex linear MK (cLMK) and an evolutionary linear MK (eLMK). These results emphasize the fact that the SVM algorithm requires a combination of kernels more complex than a linear one. 1 Introduction Various classiﬁcation techniques have been used in order to detect correctly the labels associated to some items. Kernel-based techniques (such as Support Vector Machine (SVM) [1]) are an example of such intensively explored classiﬁers. These methods represent the data by means of a kernel function, which deﬁnes similarities between pairs of data [2]. One reason for the success of kernel-based methods is that the kernel function takes relationships that are implicit in the data and makes them explicit, the result being that the detection of patterns takes place more easily. The selection of an appropriate kernel K is the most important design decision in SVM since it implicitly deﬁnes the feature space F and the map φ. An SVM will work correctly even if we do not know the exact form of the features that are used in F . The performance of an SVM algorithm depends also on several parameters. One of them, denoted C, controls the trade-off between maximizing the margin and classifying without error. The other parameters regard the kernel function. For simplicity, Chapelle in [3] has proposed to denote all these parameters as hyper parameters. All that hyper