Unsupervised Dynamic Texture Segmentation
Using Local Spatiotemporal Descriptors
Jie Chen, Guoying Zhao and Matti Pietikäinen
Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering,
P. O. Box 4500 FI-90014 University of Oulu, Finland
E-mail:{jiechen, gyzhao, mkp}@ee.oulu.fi
Abstract
Dynamic texture (DT) is an extension of texture to the
temporal domain. In this paper, we address the problem
of segmenting DT into disjoint regions in an
unsupervised way. Each region is characterized by
histograms of local binary patterns and contrast in a
spatiotemporal mode. It combines the motion and
appearance of DT together. Experimental results show
that our method is effective in segmenting regions that
differ in their dynamics.
1. Introduction
Dynamic textures or temporal textures are textures with
motion [3, 6, 12]. There are lots of DTs in real world,
including sea-waves, smoke, foliage, fire, shower and
whirlwind, etc. Potential applications of DT include
remote monitoring and various type of surveillance in
challenging environments, such as monitoring forest
fires to prevent natural disasters, traffic monitoring,
homeland security applications, and animal behavior
for scientific studies [2].
Segmentation is one of the classical problems in
computer vision [1, 8, 10]. Meanwhile, the
segmentation of DTs is a challenging problem
compared with the static case because of their unknown
spatiotemporal extension. In general, existing
approaches of DT segmentation can be generally
categorized into supervised and unsupervised methods.
For supervised segmentation, a priori information about
the textures present is needed. In contrast, unsupervised
segmentation does not need a priori information. This
makes it a very challenging research problem. However,
most of the recent methods need an initialization.
Examples of recent approaches are methods based on
mixtures of dynamic texture model [2], mixture of
linear models [4], multi-phase level sets [5],
Gauss-Markov models and level sets [6], Ising
descriptors [7], and optical flow [13].
A key problem of DT segmentation is how to
combine motion and appearance features. We notice
that the recently proposed feature, local binary patterns
in three orthogonal planes (LBP-TOP), has a promising
ability to describe both the appearance and motions of
DT [14]. It also appears to be robust to monotonic
gray-scale changes caused, e.g., by illumination
variations. In addition, Ojala and Pietikäinen used the
local binary pattern and contrast for the unsupervised
static texture segmentation and obtained good
performance [9]. Our proposed approach is based on
the work of [14] and [9].
In this paper, we generalize the frequently cited
method of Ojala to DT. Motivated by [14], we also
generalize the contrast of a single spatial texture to a
spatiotemporal mode (we call it C
TOP
, i.e., contrast in
three orthogonal planes). Combined LBP-TOP and
C
TOP
, we call the generalized method (LBP/C)
TOP
. It is
a theoretically and computationally simple approach to
model DT. We then use (LBP/C)
TOP
histograms for DT
segmentation. The extracted features (LBP/C)
TOP
in a
small local neighborhood reflect the spatio- temporal
features of DT. To the best of our knowledge, the LBP
methods have not been used earlier for DT
segmentation.
The rest of this paper is organized as follows: In
Section 2, we describe the generalized feature
(LBP/C)
TOP
and how to use it for DT segmentation. In
Section 3, we show the detailed process of
segmentation. In Section 4, some experimental results
are presented, followed by discussion in Section 5.
2. Features for segmentation
In this section, after a brief review of LBP-TOP and
intensity contrast, we describe how to use the
generalized (LBP/C)
TOP
for DT segmentation.
2.1 LBP-TOP/Contrast
LBP-TOP is a spatiotemporal descriptor [14]. As
shown in Fig. 1, (a) is a sequence of frames (or images)
of a DT; (b) denotes the three orthogonal planes or
slices XY, XT and YT, where XY is the appearance (or a
frame) of DT; XT shows the visual impression of a row
changing in time; and YT describes the motion of a
column in temporal space; (c) shows how to compute
LBP and contrast for each pixel of these three planes.
Here, a binary code is produced by thresholding its
square neighborhood from XY, XT, YT slices
independently with the value of the center pixel; (d)
shows how to compute histograms by collecting up the
occurrences of different binary patterns from three
slices which are denoted as H
λ,π
(λ=LBP and π=XY, XT,
YT). We encode DT by LBP using these three
sub-histograms to consider simultaneously the
appearance and motions in two directions, i.e.,
incorporating spatial domain information and two
spatiotemporal co-occurrence statistics together. If we
concatenate these three sub-histograms H
λ,π
(λ=LBP,
and π=XY, XT, YT) into a single histogram, it is an
LBP-TOP feature histogram.
The contrast measure C is the difference between the
average gray-level of those pixels which have value 1
and those which have value 0 (Fig. 1 (c)). Likewise, we
also compute the contrast in the three orthogonal planes,
978-1-4244-2175-6/08/$25.00 ©2008 IEEE