Dynamic Texture Segmentation Gianfranco Doretto Daniel Cremers Paolo Favaro ‡† Stefano Soatto Dept. of Computer Science, UCLA, Los Angeles, CA 90095, {doretto, cremers, soatto}@cs.ucla.edu Dept. of Electrical Eng., Washington University, St. Louis, MO 63130, fava@ee.wustl.edu Abstract We address the problem of segmenting a sequence of im- ages of natural scenes into disjoint regions that are charac- terized by constant spatio-temporal statistics. We model the spatio-temporal dynamics in each region by Gauss-Markov models, and infer the model parameters as well as the boundary of the regions in a variational optimization frame- work. Numerical results demonstrate that – in contrast to purely texture-based segmentation schemes – our method is effective in segmenting regions that differ in their dynamics even when spatial statistics are identical. 1. Introduction Consider the following problem, in relation to Fig. 1: An autonomous vehicle must decide what is traversable terrain (e.g. grass) and what is not (e.g. water). This problem can be addressed by classifying portions of the image into a number of categories, for instance grass, dirt, bushes or water. For the most part, such a classification can be ac- complished successfully by looking at simple image statis- tics, such as color or intensity. However, in many situations these are not sufficient, and therefore it may be beneficial to look at spatio-temporal statistics, and attempt to classify different portions of the scene based not on the statistics of one single image, but on how the statistics of an image change over time during a sequence. Modeling the (global) spatio-temporal statistics of the entire image can be a daunt- ing task due to the complexity of natural scenes. An alter- native consists of choosing a simple class of models, and simultaneously estimating regions and their model parame- ters in such a way that the data in each region is optimally modeled by the estimated parameters. This naturally results in a segmentation problem. In this paper we study the problem of segmenting a se- quence of images based on a simple model of its spatio- temporal statistics. Before we proceed with formalizing the problem, we Figure 1. A typical outdoor scene: an autonomous vehicle trying to classify the terrain based on spatial image statistics fails to distinguish water from grass, since the latter reflects on the former and therefore their spatial statistics are very similar (courtesy of Google Image Search). would like to point out that segmentation, in this context, is entirely dependent on the class of models chosen. Dif- ferent models result in different partitions of the scene, and there is no “right” or “wrong” result. Ultimately, the useful- ness of a statistical segmentation method depends on how well the chosen model captures the phenomenology of the physical scene, but unless one has a physical model to start with, this correspondence cannot be guaranteed. Therefore, in Sect. 1.1 we describe the model we use, which implicitly defines what we mean by “segmentation”. It is a Gauss- Markov model of the intensity of the pixels which is known as a dynamic texture. Another issue that we would like to raise at the outset is that what we model is not a point process, but rather sta- tistical distributions both in space and in time. Therefore, there will be a “minimum region of integration” in order to Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003) 2-Volume Set 0-7695-1950-4/03 $17.00 © 2003 IEEE