PERCEPTUAL NORMALIZED INFORMATION DISTANCE FOR IMAGE DISTORTION ANALYSIS BASED ON KOLMOGOROV COMPLEXITY Nima Nikvand and Zhou Wang Dept. of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada Email: nnikvand@uwaterloo.ca, zhouwang@ieee.org ABSTRACT Image distortion analysis is a fundamental issue in many image processing problems, including compression, restoration, recognition, classification, and retrieval. In this work, we investigate the problem of image distortion measurement based on the theories of Kolmogorov complexity and normalized information distance (NID), which have rarely been studied in the context of image processing. Based on a wavelet domain Gaussian scale mixture model of images, we approximate NID using a Shannon entropy based method. This leads to a series of novel distortion measures that are competitive with state-of-the-art image quality assessment approaches. 1. INTRODUCTION Normalized Information Distance (NID) measure has been shown to be a valid and universal distance metric applicable to similarity mea- surement of any two objects [1]. Similar to Kolmgorov complexity, the NID is non-computable and a practical solution is to approximate it using Normalized Compression Distance (NCD) [1, 2], which has led to impressive results in many applications such as construction of phylogeny trees using DNA sequences [1]. However, NCD did not achieve the same level of success in image similarity applications [3, 4]. A framework of Normalized Conditional Compression Distance (NCCD) was proposed in [3], which shows significantly wider applicability than existing image similarity/distortion measures. Kolmogorov complexity of an object may also be approximated using Shannon entropy, given that the object is from an ergodic stationary source [5]. The difficulty is that data that arises in practice in the form of images or complex video signals can not generally be considered as stationary processes. In fact all images are non-stationary sources in spatial domain and it is cumbersome to replace Kolmogorov complexity with Shannon entropy without any advanced transformation and modeling. In this paper we propose a framework which takes the reference image and the distorted image into the wavelet domain and assumes local independence among image subbands to approximate Kolmogorov complexity by Shannon’s entropy. Inspired by [6], Gaussian Scale Mixture (GSM) model is adopted for Natural Scene Statistics (NSS), to further simplify entropy calculations. 2. KOLMOGOROV COMPLEXITY BASED INFORMATION DISTANCES The Kolmogorov complexity [7] of an object is defined to be the length of the shortest program that can produce that object on a universal Turing machine and halt: K(x) = min p:U(p)=x l(p). In [1], the authors assume the existence of a general decompressor that can be used to decompress the presumably shortest program x ∗ to the desired object x. However, they note that due to the non-computability of this concept, a compressor that does the opposite does not have to exist. The conditional Kolmogorov complexity of x relative to y is denoted by K(x|y). An information distance between x and y can then be defined as max{K(x|y),K(y|x)}, which is the maximum of the length of the shortest program that computes x from y and y from x. To convert it to a normalized symmetric metric, a novel NID measure was introduced in [1]: NID(x, y)= max{K(x|y ∗ ),K(y|x ∗ )} max{K(x),K(y)} . (1) It was proved that NID is a valid distance metric that satisfies the identity and symmetry axioms and the triangular inequality [1]. The real-world application of NID is difficult because Kolmogorov complexity is a non-computable quantity [7]. By using the fact that K(xy)= K(y|x ∗ )+ K(x)= K(x|y ∗ )+ K(y) (subject to a logarithmic term), and by approximating Kolmogorov complexity K using a practical data compressor C, a normalized compression distance (NCD) was proposed in [1] as: NCD(x, y)= C(xy) - min{C(x),C(y)} max{C(x),C(y)} . (2) NCD has been proved to be an effective approximation of NID and achieves superior performance in bioinformatics applications such as the construction of phylogeny trees using DNA sequences [1]. Presented at International Conference on Applied Mathematics, Modeling and Computational Science (AMMCS), Waterloo, ON, Canada, July 2011