Dynamic Texture Similarity Criterion
Radek Richtr
1,2
and Michal Haindl
1,3
1
Institute of Information Theory and Automation of the CAS,
2
Faculty of Information Technology, CTU, Prague, Czech Republic
3
Faculty of Management, University of Economics, Jindˇ rich˚ uv Hradec, Czech Republic
Email: {richtrad,haindl}@utia.cas.cz
Abstract—Dynamic texture similarity ranking is a challenging
and still unsolved problem. Evaluation of how well are various
dynamic textures similar to humans perception view is extremely
difficult even for static textures and requires tedious psycho-
physical experiments. Human perception principles are largely
not understood yet and the dynamic texture perception is
further complicated with a distinct way of perceiving spatial
and temporal domains, which complicates any similarity criterion
definition. We propose a novel dynamic texture criterion based
on the Fourier transformation and properties of dynamic texture
spatio-temporal frequencies. The presented criterion correlates
well with performed psycho-physical tests while maintaining
sufficient diversity and descriptiveness.
I. I NTRODUCTION
Neither static nor dynamic (or temporal, DT) rigorous
mathematical texture definition exists. Dynamic textures (DT)
can be vaguely defined as spatially repetitive motion patterns
exhibiting homogeneous temporal properties. Examples might
be smoke, haze, fire or liquids, also waving trees or straws or
some moving mechanical objects.
Mutual similarity assessment and similarity ranking of two
or more visual textures is a difficult problem due to real
material textures complex dependencies on 16 physical obser-
vation parameters [1]. Evaluation of how well various texture
models conform to human visual perception is important not
only for assessing the similarities between a model output and
the original measured texture, but also for optimal settings of
model parameters, for a fair comparison of distinct models,
material recognition, etc. This problem is not satisfactorily
solved even for simpler static textures [2], [3]. Currently
the only reliable, but extremely impractical and expensive
option, is to exploit the methods of visual psycho-physics.
The psycho-physical methods [1] require a lengthy process of
experiment design, tightly controlled laboratory condition, and
representative panel of human testing subjects. Such testing
obviously cannot be performed on a daily basis. Few published
static texture criteria allow to test selected texture properties
such as the texture regularity [4], etc. Others claim to test
general texture quality [5]–[7]. Our recent test [2] on our
texture fidelity benchmark (http://tfa.utia.cas.cz) of several
state-of-the-art image quality measures and several recently
published static texture criteria confirms their insufficient
reliability and low robustness. The evaluated criteria were - the
structural similarity (SSIM) index [8], the visual information
fidelity (VIF) methods [9], the visual signal-to-noise- ratio [10]
(VSNR), the mean-squared error (MSE) [11], the complex
wavelet - structural similarity (CW-SSIM) index [12], and
the structural texture similarity measure (STSIM-1, STSIM-2,
STSIM-M) [6], and all are severely restricted to only gray-
scale textures. The results have demonstrated [2] that the
standard image quality criteria (MSE, VSNR, VIF, SSIM, CW-
SSIM) do not correlate well with the human quality assessment
of textures at all. Although, the STSIM texture criteria have
significantly higher correlation with human ranking, they do
not successfully solve this problem. Our textural qualitative
criterion based on the generative Markov texture model statis-
tics ζ [3] is fully multispectral and slightly outperforms the
best alternative - the STSIM fidelity criterion. All previous
texture similarity criteria can be formally generalized also
for dynamic textures if they are applied to the corresponding
frame couples of compared dynamic textures and subsequently
combined these partial results. However, we can expect ad-
missible ranking only for some oversimplified tests such as
identical dynamic textures which differ only in additive noise
level. Thus a novel more robust criterion which we present
here is clearly needed.
II. SIMILARITY CRITERION
The proposed DT similarity criterion is based on the three-
dimension Fourier transformation for each spectral band (see
Fig.1). The Fourier transformation of a function f (x
1
,x
2
,x
3
)
finds the spatial frequencies ξ =(ξ
1
,ξ
2
,ξ
3
). The 3-dimension
Fourier transformation for the f (x
1
,x
2
,x
3
) function can be
written as:
F{f (ξ
1
,ξ
2
,ξ
3
)} =
R
3
e
−2πiξx
f (x
1
,x
2
,x
3
)dx
1
,dx
2
,dx
3
.
The harmonics are the complex exponential e
±2πixξ
with three
spatial frequencies ξ . In three-dimensions a given ξ defines
a family ξ · x = integer of parallel planes (of zero phase)
in the x =(x
1
,x
2
,x
3
) space. The normal to any of the
planes is the vector ξ and adjacent planes are a distance
1/||ξ || apart. The exponential is periodic in the direction ξ
with period 1/||ξ || [13]. The combination of 2D and 1D
Fourier transformation is used to detect dynamics of significant
local and global spatial frequencies. The crucial part of video
similarity perception is its structures dynamic behavior [14].
This behavior is mainly described by a complex exponential
with the normal ξ with a dominant temporal component ξ
3
.
Components ξ
3
similar to ξ
1,2
are less recognizable. For
simplicity (and because of the separability of the individual
2018 24th International Conference on Pattern Recognition (ICPR)
Beijing, China, August 20-24, 2018
978-1-5386-3787-6/18/$31.00 ©2018 IEEE 904