Learning Pullback Metrics on Dynamical Models Fabio Cuzzolin ∗ INRIARhˆone-Alpes 655 avenue de l’Europe Montbonnot, France Abstract Consider the problem of classifying motions, encoded as dynamical models of a certain class. Standard nearest neighbor classiﬁca- tion then reduces to ﬁnd a suitable distance function in the space of the models. In this paper we present a supervised diﬀerential- geometric method to learn a Riemannian metric for a given class of dynamical mod- els in order to improve classiﬁcation perfor- mances. Given a training set of models the optimal metric is selected among a family of pullback metrics induced by the Fisher information tensor through a parameterized diﬀeomorphism. Experimental results con- cerning action and identity recognition based on simple scalar features are shown, proving how learning a metric actually improves clas- siﬁcation rates when compared with Fisher geodesic distance and other classical distance functions. 1 INTRODUCTION Human motion recognition is one of the most popular ﬁelds in computer vision, due to both its applicative potential and its richness in terms of the technical is- sues involved. Consider then the problem of classify- ing a number of movements, represented as sequences of image features: Representing those sequences in a compact way as simple dynamical models has proved to be eﬀective in problems like dynamic textures [7]. The motion classiﬁcation problem then reduces to ﬁnd- ing the appropriate distance function in the space of the dynamical models of the chosen class. A number of distance functions for linear systems has been already introduced, in particular in the context of system iden- tiﬁcation: Martin’s distance between cepstrums [15], * Footnote for author to give an alternate address. subspace angles [6], gap metric [29] and its variants nu-gap [27] and graph metric [26], kernel methods [24]. Besides, a vast literature can be found about dissimi- larity measures between hidden Markov models, most of them variants of the Kullback-Leibler divergence [13]. However, a simple mental experiment is enough to un- derstand how no single distance function can possibly outperform the others in each and every classiﬁcation problem, since data-points can be endowed with dif- ferent labeling while maintaining the same geometri- cal structure. Consider for instance gait analysis [12]. Each image sequence naturally possesses several dif- ferent labels, like for instance: identity of the moving person, category of action performed, viewpoint (when several cameras are presented [9]), emotional status, etc. The most reasonable thing to do, when possessing some a-priori information in terms of partially labelled data or similarity classes, is then try and learn in a supervised fashion the “best” distance function for a speciﬁc classiﬁcation problem. This topic has become quite popular in the last few years (see for instance [2, 3, 5, 22, 25, 30, 8]). Many unsupervised algorithms, in particular, take an input dataset and embed it in some other space, implicitly learning a metric (locally linear embedding [21] among the others), but fail to learn a full metric for the whole input space. On the other side, approaches to successfully reduce metric learning to constrained least square optimization in the linear case have been proposed [28, 23]. However, as even linear dynamical models live in a nonlinear space, the need for a principled way of learn- ing Riemannian metrics from the data naturally arises. An interesting tool is provided by the formalism of pullback metrics. If models belong to a Riemannian manifold M , any diﬀeomorphism of M onto itself in- duces such a metric on M . By designing a suitable family of diﬀeomorphisms depending on a parameter p we then obtain a family of pullback metrics on M we can optimize on.