Image Representation by the Magnitude of the Discrete Gabor Wavelet Transform Submitted to IEEE Transactions on Image Processing on 1999-10-26 Ingo J. Wundrich, Christoph von der Malsburg, Rolf P. W¨ urtz Abstract — We present an analytical analysis of the rep- resentation of images as the magnitudes of their transform with complex-valued Gabor wavelets. Although reconstruc- tion of the image is difficult such a representation is very useful for image understanding purposes. We show that if the sampling of the linear wavelet transform is appropriate then the representation by the nonlinearity introduced by the magnitude is unique up to the sign for almost all images. Finally, numerical experiments with a phase retrieval algo- rithm show that recognizable versions of the original image can be reconstructed from the magnitudes. Keywords — Gabor wavelets, feature extraction, phase re- trieval, Gabor magnitudes, DFT, interpolation, image cod- ing, visual cortex. I. Introduction Attempts to build image understanding systems are well advised to pay attention to what we know about biolog- ical vision. In the visual systems of humans or monkeys an early stage is concerned with processing image informa- tion by convolution with some point spread function and ensuing non-linear operations that enhance edges and lead to some contrast normalization. A recognized problem [1] with edge-enhancing convolutions is their sensitivity to im- age shift, whereas one of the most important features of hu- man vision is the positional invariance with which objects can be recognized. The two-dimensional power spectrum can represent images in a shift-invariant way, but it has the great drawback of being non-local, each component being influenced by all image pixels. On the other extreme, pix- els as image features are maximally localized but achieve notoriously little in terms of analyzing image contents. We focus here on Gabor functions as an adjustable com- promise between pixel representation and Fourier compo- nents. They seem to be implemented in the first stages of processing in the visual cortex of higher vertebrates, as the receptive fields of the so-called simple cells can be described to some accuracy as Gabor functions [2], [3]. There is also evidence that the magnitudes of the Gabor filter responses are calculated by another set of cells called complex cells [4]. The simplest model for these findings is that simple cell responses are calculated from the image intensities by a This work has been supported by grants from BMBF, ONR and ARO. The authors are at the Institut ur Neuroinformatik, Ruhr- Universit¨ at Bochum, D–44780 Bochum, Germany. URL: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/ Christoph von der Malsburg is also at the Dept. of Computer Sci- ence, University of Southern California, Los Angeles, USA feedforward neural net, and that complex cells build on their information by another feedforward net. The com- plex cells, in turn, can be combined to more complicated feature detectors such as corner detectors [5]. They have also proven useful for higher image understanding tasks such as texture classification [6] face recognition [7], [8], and gesture recognition [9]. If the Gabor functions are arranged into a wavelet trans- form and the sampling is dense enough (see section II-D) then the original image can be recovered from the trans- form values with arbitrary quality (except for the DC- value). Given the useful properties of the magnitudes of the Gabor transform an important theoretical question is how much image information can be recovered from that. It must be noted that the study of reconstruction is not the most important issue for an object recognition system, where most of the visual information must be discarded. Nevertheless, it is of importance to study as well as possible the course of information through the system in order to understand its mechanism. Additionally, our results may be of importance to various applications of Gabor wavelets. In the following chapters we first outline the theory of wavelets and frames necessary to understand reconstruc- tion from the linear transform, introduce the Gabor wavelet transform, and review some literature on phase retrieval from power spectra of images. Then we present a formal proof that, given the right transform parameters and ap- propriate band-limitation, no image information is lost be- side the DC value of the image and a global sign. The proof uses techniques from [10] and applies to all images except a possible subset of measure zero. Finally, we explore the quality of reconstruction by a nu- merical implementation of the steps in the proof. The re- sults are not perfect because much of the proof depends on transform values being exactly zero, which does not trans- late very well into numerical computation. However, for all images we tested, we were able to retrieve recognizable versions of either the image itself or its negative. In this paper we use three different Fourier transforms, namely the continuous one (FT) L 2 ( 2 ) L 2 ( 2 ), the 2D equivalent of Fourier Series (DSFT) L 2 ( 2 ) L 2 (U 2 ) with U = [π,π[, and the completely discretized and finite version (DFT) L 2 ( N ) L 2 ( N ), where N = {0, 1,...,N 1 1}×{0, 1,...,N 2 1}. To keep the notation short we introduce ˜ ρ =[ρ 1 /N 1 ρ 2 /N 2 ] T . For the sake of clarity, different symbols are used for all three transforma-