Nonlinear Multiple Kernel Learning via Mixture of Probabilistic Kernel Discriminant Analysis Zheng Zhao zhaozheng@asu.edu Computer Science & Engineering Arizona State University Jieping Ye jieping.ye@asu.edu Computer Science & Engineering Arizona State University Shipeng Yu shipeng.yu@siemens.com CAD and Knowledge Solutions Siemens Medical Solutions USA, Inc. Huan Liu huan.liu@asu.edu Computer Science & Engineering Arizona State University Abstract Multiple kernel learning (MKL) provides a powerful tool for heterogenous data integration. Most existing MKL formulations are based on a linear kernel combination, which, however, restricts the flexibility of the learning model. In this paper, we propose a novel nonlinear multiple kernel learning formulation based on the model combination. The proposed formulation (called MPKDA) is derived from a novel probabilistic model for kernel discriminant analysis (KDA) and its mixture. Experimental results on various real applications demonstrate that the proposed MPKDA model provides competitive performance comparing with the representative approaches. We also analyze the relationship between the proposed model and the existing KDA-based MKL formulations, and show how to use the proposed MPKDA model to handle missing data and perform localized multiple kernel learning (LMKL). 1 Introduction Kernel methods, such as the support vector machine (SVM) [Vap95], gained popularity due to their successful applications in solving a wide range of real-world problems [TFM07, BHOS + 08]. Kernel methods [SS02, STC04] work by embedding the input data into a high-dimensional feature space, where the embedding can be determined uniquely by specifying a kernel function that computes the dot product between data points in the feature space implicitly. Thus one of the central problems in kernel methods is the learning of good kernels. Recently, multiple kernel learning (MKL), which learns a linear combination of multiple in- put kernels, has been shown to improve the classification performance [LCB + 04]. In [LCB + 04], an optimal kernel matrix is learnt by linearly combining a set of pre-specified kernel matrices and the combination coefficients can be determined by solving a convex optimization problem. In [BLMIJkltSa04], this problem is reformulated as support kernel machines (SKM), and this SKM formulation is further reformulated as a semi-infinite linear program (SILP) to handle large-scale problems [SRSS06, RBCG07]. Similar problems are also studied in [TK06, LJN06, MP07]. While most existing approaches are based on SVM, the MKL formulation in [FDBR04] is based on ker- nel discriminant analysis (KDA). The problem is reformulated as a semidefinite program (SDP) in [KMB06], which is further extended to handle multiclass problems in [YCJ07]. MKL provides a powerful tool for integrating multiple data with heterogenous representations and has been suc- cessfully applied in many real-world problems [BTOM07]. 1