Model Selection Using a Class of Kernels with an Invariant Metric Akira Tanaka 1 , Masashi Sugiyama 2 , Hideyuki Imai 1 , Mineichi Kudo 1 , and Masaaki Miyakoshi 1 1 Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814, Japan, {takira,imai,mine,miyakosi}@main.eng.hokudai.ac.jp 2 Department of Computer Science, Tokyo Institute of Technology, Meguro-ku, Tokyo, 152-8552, Japan, sugi@cs.titech.ac.jp Abstract. Learning based on kernel machines is widely known as a powerful tool for various ﬁelds of information science such as pattern recognition and regression estimation. The eﬃcacy of the model in ker- nel machines depends on the distance between the unknown true function and the linear subspace, speciﬁed by the training data set, of the repro- ducing kernel Hilbert space corresponding to an adopted kernel. In this paper, we propose a framework for the model selection of kernel-based learning machines, incorporating a class of kernels with an invariant met- ric. 1 Introduction Learning based on kernel machines[1] is widely known as a powerful tool for various ﬁelds of information science such as pattern recognition and regression estimation. Many kernel machines, represented by the support vector machines[2] and the kernel ridge regression[3, 4], are proposed. In these methods, kernels are recognized as useful tools to calculate the inner product in high-dimensional feature spaces[3, 4]. On the other hand, according to the theory of reproducing kernel Hilbert spaces[5, 6], the essence of using kernels in learning problems is that the unknown target (classiﬁers in pattern recognition problems, unknown true functions in regression estimation problems, and so on) belongs to the reproducing kernel Hilbert space corresponding to the adopted kernel. On the basis of this essence, Ogawa formulated a learning problem as an inversion problem of a linear op- erator from the reproducing kernel Hilbert space corresponding to the adopted kernel onto a certain vector space concerned with the given training data set and constructed a series of learning machines, named “(parametric) projection learning”, that gives a good approximation of the orthogonal projector of the unknown true function onto the linear subspace, speciﬁed by the given training