Research Note RN/11/06 The Fisher Kernel: A Brief Review 20 January 2011 Martin Sewell Abstract The basic idea behind the Fisher kernel method is to train a (generative) hidden Markov model (HMM) on data to derive a Fisher kernel for a (discriminative) support vector machine (SVM). The Fisher kernel gives a ‘natural’ similarity measure that takes into account the underlying probability distribution. If each data item is a (possibly varying length) sequence, each may be used to train a HMM, with the average of the models in the training set used to construct a global HMM. It is then possible to calculate how much a new data item would ‘stretch’ the parameters of the existing model. This is achieved by, for two data items, calculating and comparing the gradient of the log-likelihood of the data item with respect to the model with a given set of parameters. If these ‘Fisher scores’ are similar it means that the two data items would adapt the model in the same way, that is from the point of view of the given parametric model at the current parameter setting they are similar in the sense that they would require similar adaptations to the parameters. This brief note introduces the Fisher score and the Fisher kernel, then reviews the literature on Fisher kernels. UCL DEPARTMENT OF COMPUTER SCIENCE