Appears in IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, 2007. Classifying Video with Kernel Dynamic Textures Antoni B. Chan and Nuno Vasconcelos Department of Electrical and Computer Engineering University of California, San Diego abchan@ucsd.edu, nuno@ece.ucsd.edu Abstract The dynamic texture is a stochastic video model that treats the video as a sample from a linear dynamical sys- tem. The simple model has been shown to be surprisingly useful in domains such as video synthesis, video segmen- tation, and video classification. However, one major dis- advantage of the dynamic texture is that it can only model video where the motion is smooth, i.e. video textures where the pixel values change smoothly. In this work, we propose an extension of the dynamic texture to address this issue. Instead of learning a linear observation function with PCA, we learn a non-linear observation function using kernel- PCA. The resulting kernel dynamic texture is capable of modeling a wider range of video motion, such as chaotic motion (e.g. turbulent water) or camera motion (e.g. pan- ning). We derive the necessary steps to compute the Martin distance between kernel dynamic textures, and then validate the new model through classification experiments on video containing camera motion. 1. Introduction The dynamic texture [1] is a generative stochastic model of video that treats the video as a sample from a linear dynamical system. Although simple, the model has been shown to be surprisingly useful in domains such as video synthesis [1, 2], video classification [3, 4, 5], and video seg- mentation [6, 7, 8, 2]. Despite these numerous successes, one major disadvantage of the dynamic texture is that it can only model video where the motion is smooth, i.e. video textures where the pixel values change smoothly. This lim- itation stems from the linear assumptions of the model: specifically, 1) the linear state-transition function, which models the evolution of the hidden state-space variables over time; and 2) the linear observation function, which maps the state-space variables into observations. As a re- sult, the dynamic texture cannot model more complex mo- tion, such as chaotic motion (e.g. turbulent water) or camera motion (e.g. panning, zooming, and rotations). To some extent, the smoothness limitation of the dy- namic texture has been addressed in the literature by modi- fying the linear assumptions of the dynamic texture model. For example, [9] keeps the linear observation function, while modeling the state-transitions with a closed-loop dy- namic system. In contrast, [10, 11] utilize a non-linear observation function, modeled as a mixture of linear sub- spaces, while keeping the standard linear state-transitions. Similarly in [12], different views of a video texture are represented by a non-linear observation function that mod- els the video texture manifold from different camera view- points. Finally, [7] treats the observation function as a piece-wise linear function that changes over time, but is not a generative model. In this paper, we improve the modeling capability of the dynamic texture by using a non-linear observation function, while maintaining the linear state transitions. In particular, instead of using PCA to learn a linear observation function, as with the standard dynamic texture, we use kernel PCA to learn a non-linear observation function. The resulting ker- nel dynamic texture is capable of modeling a wider range of video motion. The contributions of this paper are three-fold. First, we introduce the kernel dynamic texture and describe a simple algorithm for learning the parameters. Second, we show how to compute the Martin distance between kernel dynamic textures, and hence introduce a similarity measure for the new model. Third, we build a video classifier based on the kernel dynamic texture and the Martin distance, and evaluate the efficacy of the model through a classification experiment on video containing camera motion. We begin the paper with a brief review of kernel PCA, followed by each of the three contributions listed above. 2. Kernel PCA Kernel PCA [13] is the kernelized version of standard PCA [14]. With standard PCA, the data is projected onto the linear subspace (linear principal components) that best captures the variability of the data. In contrast, kernel PCA (KPCA) projects the data onto non-linear functions in the input-space. These non-linear principal components are de- 1