Clustering-based construction of Hidden Markov Models for Generative Kernels M. Bicego 1,2⋆ , M. Cristani 1,2 , V. Murino 1,2 , E. P ekalska 3 , and R.P.W. Duin 4 1 Computer Science Department, University of Verona, Italy 2 Istituto Italiano di Tecnologia (IIT), Italy 3 School of Computer Science, University of Manchester, UK 4 Delft University of Technology, The Netherlands Abstract. Generative kernels represent theoretically grounded tools able to increase the capabilities of generative classification through a discrim- inative setting. Fisher Kernel is the first and mostly-used representa- tive, which lies on a widely investigated mathematical background. The manufacture of a generative kernel flows down through a two-step serial pipeline. In the first, “generative” step, a generative model is trained, considering one model for class or a whole model for all the data; then, features or scores are extracted, which encode the contribution of each data point in the generative process. In the second, “discriminative” part, the scores are evaluated by a discriminative machine via a kernel, exploit- ing the data separability. In this paper we contribute to the first aspect, proposing a novel way to fit the class-data with the generative models, in specific, focusing on Hidden Markov Models (HMM). The idea is to per- form model clustering on the unlabeled data in order to discover at best the structure of the entire sample set. Then, the label information is re- trieved and generative scores are computed. Experimental, comparative test provides a preliminary idea on the goodness of the novel approach, pushing forward for further developments. 1 Introduction Hidden Markov Models (HMMs) represent a powerful and ductile statistical learning framework. In the classical HMM-based classification a single HMM is built for each class and the Maximum A Posteriori (MAP) approach is used to classify an unlabeled sequence O, thus following a pure generative classification scheme. Even though the MAP rule represents the theoretically optimal decision rule (i.e. leading to the minimum probability of error [1]), in practice, generative classification may suffer from poor discriminative abilities. This is likely to oc- cur in case of poorly estimated class models (e.g. due to insufficient learning examples), improper model topologies, (e.g. due to a bad model definition or conditional dependence of the states), or possible class overlap (as may occur ⋆ Corresponding Author: Address: Strada Le Grazie, 15 - 37134 Verona (Italy). Tel: +39 045 8027072 - Fax: +39 045 8027068 - e-mail: manuele.bicego@univr.it