Connecting the Out-of-Sample and Pre-Image Problems in Kernel Methods ∗ Pablo Arias Universidad de la Rep´ ublica parias@fing.edu.uy Gregory Randall Universidad de la Rep´ ublica randall@fing.edu.uy Guillermo Sapiro University of Minnesota guille@umn.edu Abstract Kernel methods have been widely studied in the ﬁeld of pattern recognition. These methods implicitly map, “the kernel trick,” the data into a space which is more appropri- ate for analysis. Many manifold learning and dimension- ality reduction techniques are simply kernel methods for which the mapping is explicitly computed. In such cases, two problems related with the mapping arise: The out-of- sample extension and the pre-image computation. In this paper we propose a new pre-image method based on the Nystr¨ om formulation for the out-of-sample extension, show- ing the connections between both problems. We also ad- dress the importance of normalization in the feature space, which has been ignored by standard pre-image algorithms. As an example, we apply these ideas to the Gaussian kernel, and relate our approach to other popular pre-image meth- ods. Finally, we show the application of these techniques in the study of dynamic shapes. 1. Introduction Kernel methods have been shown to be powerful tech- niques for studying non-linear data. The main idea behind these methods is to map the data into a space better suited for linear algorithms. The mapping, however, is often not explicitly computed, leading to the so called “kernel trick:” The kernel function encodes the useful information about the mapping. Kernel methods have been used in numerous image processing and computer vision applications; see for example [24] for a comprehensive review on kernel meth- ods. Kernel methods are closely related to manifold learning techniques such as those described in [2, 5, 10, 17, 23, 26], see [3, 4, 13] for details. The aim of these algorithms is to map the original dataset into a parameter space, usually of lower dimension. The mapping is associated with a kernel * Work partially supported by NSF, ONR, NGA, and DARPA. PA and GR performed part of this work while visiting ECE&IMA at the UofM. We thank the authors of [1] for the data and the third reviewer for pointing out important references. function, giving a different point of view to the manifold learning problem. As a result, new manifold learning al- gorithms have been developed using design techniques bor- rowed from the kernel methods theory, e.g. [28]. In general, for manifold learning techniques, the map- ping is only known over the training set. This mapping needs to be extended to new input points as they come, without having to re-compute the (often expensive) whole map. This is known as the out-of-sample problem. In ad- dition, after operations are performed in the mapped (fea- ture) space, often the corresponding data point in the origi- nal space needs to be computed. This is known as the pre- image problem. While both problems are treated separately in the literature, we show in this paper that they are closely related, and in particular, the Nystr¨ om extension for the out- of-sample task can be extended to address the pre-image is- sue as well. We should note that most of the work in the pre-image problem has been done for the Gaussian kernel. This kernel has been widely used in the ﬁeld of patter clas- siﬁcation and also for manifold learning. In [8, 9, 25] the Gaussian kernel is used to perform kernel principal compo- nent analysis (PCA) for image de-noising and shape man- ifold learning, outperforming ordinary PCA. In [7] a non- parametric probability density function is learned by assum- ing a Normal distribution in the feature space of a Gaussian kernel. A common approach for studying both static and dy- namic data is to ﬁrst learn the non-linear manifold under- lying the training set, e.g. [1, 7, 9, 19]. The learned man- ifold is then used for diverse applications such as activity recognition and object tracking. In these cases, both the out- of-sample extension and the pre-image problem are central issues. The out-of-sample extension is critical to handle new data as it comes in, without the necessity to re-learn the manifold, task which is computationally expensive and performed off-line. The pre-image is critical to being able to work back in the original space, either for visualization (when computing an average, for example, or when using kernel PCA for de-noising or analysis), or for operations such as tracking in the video space. The contribution of this paper is threefold. First, we pro- 1