Dynamic Texture Similarity Criterion Radek Richtr 1,2 and Michal Haindl 1,3 1 Institute of Information Theory and Automation of the CAS, 2 Faculty of Information Technology, CTU, Prague, Czech Republic 3 Faculty of Management, University of Economics, Jindˇ rich˚ uv Hradec, Czech Republic Email: {richtrad,haindl}@utia.cas.cz Abstract—Dynamic texture similarity ranking is a challenging and still unsolved problem. Evaluation of how well are various dynamic textures similar to humans perception view is extremely difﬁcult even for static textures and requires tedious psycho- physical experiments. Human perception principles are largely not understood yet and the dynamic texture perception is further complicated with a distinct way of perceiving spatial and temporal domains, which complicates any similarity criterion deﬁnition. We propose a novel dynamic texture criterion based on the Fourier transformation and properties of dynamic texture spatio-temporal frequencies. The presented criterion correlates well with performed psycho-physical tests while maintaining sufﬁcient diversity and descriptiveness. I. I NTRODUCTION Neither static nor dynamic (or temporal, DT) rigorous mathematical texture deﬁnition exists. Dynamic textures (DT) can be vaguely deﬁned as spatially repetitive motion patterns exhibiting homogeneous temporal properties. Examples might be smoke, haze, ﬁre or liquids, also waving trees or straws or some moving mechanical objects. Mutual similarity assessment and similarity ranking of two or more visual textures is a difﬁcult problem due to real material textures complex dependencies on 16 physical obser- vation parameters [1]. Evaluation of how well various texture models conform to human visual perception is important not only for assessing the similarities between a model output and the original measured texture, but also for optimal settings of model parameters, for a fair comparison of distinct models, material recognition, etc. This problem is not satisfactorily solved even for simpler static textures [2], [3]. Currently the only reliable, but extremely impractical and expensive option, is to exploit the methods of visual psycho-physics. The psycho-physical methods [1] require a lengthy process of experiment design, tightly controlled laboratory condition, and representative panel of human testing subjects. Such testing obviously cannot be performed on a daily basis. Few published static texture criteria allow to test selected texture properties such as the texture regularity [4], etc. Others claim to test general texture quality [5]–[7]. Our recent test [2] on our texture ﬁdelity benchmark (http://tfa.utia.cas.cz) of several state-of-the-art image quality measures and several recently published static texture criteria conﬁrms their insufﬁcient reliability and low robustness. The evaluated criteria were - the structural similarity (SSIM) index [8], the visual information ﬁdelity (VIF) methods [9], the visual signal-to-noise- ratio [10] (VSNR), the mean-squared error (MSE) [11], the complex wavelet - structural similarity (CW-SSIM) index [12], and the structural texture similarity measure (STSIM-1, STSIM-2, STSIM-M) [6], and all are severely restricted to only gray- scale textures. The results have demonstrated [2] that the standard image quality criteria (MSE, VSNR, VIF, SSIM, CW- SSIM) do not correlate well with the human quality assessment of textures at all. Although, the STSIM texture criteria have signiﬁcantly higher correlation with human ranking, they do not successfully solve this problem. Our textural qualitative criterion based on the generative Markov texture model statis- tics ζ [3] is fully multispectral and slightly outperforms the best alternative - the STSIM ﬁdelity criterion. All previous texture similarity criteria can be formally generalized also for dynamic textures if they are applied to the corresponding frame couples of compared dynamic textures and subsequently combined these partial results. However, we can expect ad- missible ranking only for some oversimpliﬁed tests such as identical dynamic textures which differ only in additive noise level. Thus a novel more robust criterion which we present here is clearly needed. II. SIMILARITY CRITERION The proposed DT similarity criterion is based on the three- dimension Fourier transformation for each spectral band (see Fig.1). The Fourier transformation of a function f (x 1 ,x 2 ,x 3 ) ﬁnds the spatial frequencies ξ =(ξ 1 ,ξ 2 ,ξ 3 ). The 3-dimension Fourier transformation for the f (x 1 ,x 2 ,x 3 ) function can be written as: F{f (ξ 1 ,ξ 2 ,ξ 3 )} =  R 3 e −2πiξx f (x 1 ,x 2 ,x 3 )dx 1 ,dx 2 ,dx 3 . The harmonics are the complex exponential e ±2πixξ with three spatial frequencies ξ . In three-dimensions a given ξ deﬁnes a family ξ · x = integer of parallel planes (of zero phase) in the x =(x 1 ,x 2 ,x 3 ) space. The normal to any of the planes is the vector ξ and adjacent planes are a distance 1/||ξ || apart. The exponential is periodic in the direction ξ with period 1/||ξ || [13]. The combination of 2D and 1D Fourier transformation is used to detect dynamics of signiﬁcant local and global spatial frequencies. The crucial part of video similarity perception is its structures dynamic behavior [14]. This behavior is mainly described by a complex exponential with the normal ξ with a dominant temporal component ξ 3 . Components ξ 3 similar to ξ 1,2 are less recognizable. For simplicity (and because of the separability of the individual 2018 24th International Conference on Pattern Recognition (ICPR) Beijing, China, August 20-24, 2018 978-1-5386-3787-6/18/$31.00 ©2018 IEEE 904